Pipecat-ai

Latest version: v0.0.62

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 11

0.0.44

Added

- Added support for OpenAI Realtime API with the new
`OpenAILLMServiceRealtimeBeta` processor.
(see https://platform.openai.com/docs/guides/realtime/overview)

- Added `RTVIBotTranscriptionProcessor` which will send the RTVI
`bot-transcription` protocol message. These are TTS text aggregated (into
sentences) messages.

- Added new input params to the `MarkdownTextFilter` utility. You can set
`filter_code` to filter code from text and `filter_tables` to filter tables
from text.

- Added `CanonicalMetricsService`. This processor uses the new
`AudioBufferProcessor` to capture conversation audio and later send it to
Canonical AI.
(see https://canonical.chat/)

- Added `AudioBufferProcessor`. This processor can be used to buffer mixed user and
bot audio. This can later be saved into an audio file or processed by some
audio analyzer.

- Added `on_first_participant_joined` event to `LiveKitTransport`.

Changed

- LLM text responses are now logged properly as unicode characters.

- `UserStartedSpeakingFrame`, `UserStoppedSpeakingFrame`,
`BotStartedSpeakingFrame`, `BotStoppedSpeakingFrame`, `BotSpeakingFrame` and
`UserImageRequestFrame` are now based from `SystemFrame`

Fixed

- Merge `RTVIBotLLMProcessor`/`RTVIBotLLMTextProcessor` and
`RTVIBotTTSProcessor`/`RTVIBotTTSTextProcessor` to avoid out of order issues.

- Fixed an issue in RTVI protocol that could cause a `bot-llm-stopped` or
`bot-tts-stopped` message to be sent before a `bot-llm-text` or `bot-tts-text`
message.

- Fixed `DeepgramSTTService` constructor settings not being merged with default
ones.

- Fixed an issue in Daily transport that would cause tasks to be hanging if
urgent transport messages were being sent from a transport event handler.

- Fixed an issue in `BaseOutputTransport` that would cause `EndFrame` to be
pushed downed too early and call `FrameProcessor.cleanup()` before letting the
transport stop properly.

0.0.43

Added

- Added a new util called `MarkdownTextFilter` which is a subclass of a new
base class called `BaseTextFilter`. This is a configurable utility which
is intended to filter text received by TTS services.

- Added new `RTVIUserLLMTextProcessor`. This processor will send an RTVI
`user-llm-text` message with the user content's that was sent to the LLM.

Changed

- `TransportMessageFrame` doesn't have an `urgent` field anymore, instead
there's now a `TransportMessageUrgentFrame` which is a `SystemFrame` and
therefore skip all internal queuing.

- For TTS services, convert inputted languages to match each service's language
format

Fixed

- Fixed an issue where changing a language with the Deepgram STT service
wouldn't apply the change. This was fixed by disconnecting and reconnecting
when the language changes.

0.0.42

Added

- `SentryMetrics` has been added to report frame processor metrics to
Sentry. This is now possible because `FrameProcessorMetrics` can now be passed
to `FrameProcessor`.

- Added Google TTS service and corresponding foundational example
`07n-interruptible-google.py`

- Added AWS Polly TTS support and `07m-interruptible-aws.py` as an example.

- Added InputParams to Azure TTS service.

- Added `LivekitTransport` (audio-only for now).

- RTVI 0.2.0 is now supported.

- All `FrameProcessors` can now register event handlers.

tts = SomeTTSService(...)

tts.event_handler("on_connected"):
async def on_connected(processor):
...

- Added `AsyncGeneratorProcessor`. This processor can be used together with a
`FrameSerializer` as an async generator. It provides a `generator()` function
that returns an `AsyncGenerator` and that yields serialized frames.

- Added `EndTaskFrame` and `CancelTaskFrame`. These are new frames that are
meant to be pushed upstream to tell the pipeline task to stop nicely or
immediately respectively.

- Added configurable LLM parameters (e.g., temperature, top_p, max_tokens, seed)
for OpenAI, Anthropic, and Together AI services along with corresponding
setter functions.

- Added `sample_rate` as a constructor parameter for TTS services.

- Pipecat has a pipeline-based architecture. The pipeline consists of frame
processors linked to each other. The elements traveling across the pipeline
are called frames.

To have a deterministic behavior the frames traveling through the pipeline
should always be ordered, except system frames which are out-of-band
frames. To achieve that, each frame processor should only output frames from a
single task.

In this version all the frame processors have their own task to push
frames. That is, when `push_frame()` is called the given frame will be put
into an internal queue (with the exception of system frames) and a frame
processor task will push it out.

- Added pipeline clocks. A pipeline clock is used by the output transport to
know when a frame needs to be presented. For that, all frames now have an
optional `pts` field (prensentation timestamp). There's currently just one
clock implementation `SystemClock` and the `pts` field is currently only used
for `TextFrame`s (audio and image frames will be next).

- A clock can now be specified to `PipelineTask` (defaults to
`SystemClock`). This clock will be passed to each frame processor via the
`StartFrame`.

- Added `CartesiaHttpTTSService`.

- `DailyTransport` now supports setting the audio bitrate to improve audio
quality through the `DailyParams.audio_out_bitrate` parameter. The new
default is 96kbps.

- `DailyTransport` now uses the number of audio output channels (1 or 2) to set
mono or stereo audio when needed.

- Interruptions support has been added to `TwilioFrameSerializer` when using
`FastAPIWebsocketTransport`.

- Added new `LmntTTSService` text-to-speech service.
(see https://www.lmnt.com/)

- Added `TTSModelUpdateFrame`, `TTSLanguageUpdateFrame`, `STTModelUpdateFrame`,
and `STTLanguageUpdateFrame` frames to allow you to switch models, language
and voices in TTS and STT services.

- Added new `transcriptions.Language` enum.

Changed

- Context frames are now pushed downstream from assistant context aggregators.

- Removed Silero VAD torch dependency.

- Updated individual update settings frame classes into a single
`ServiceUpdateSettingsFrame` class.

- We now distinguish between input and output audio and image frames. We
introduce `InputAudioRawFrame`, `OutputAudioRawFrame`, `InputImageRawFrame`
and `OutputImageRawFrame` (and other subclasses of those). The input frames
usually come from an input transport and are meant to be processed inside the
pipeline to generate new frames. However, the input frames will not be sent
through an output transport. The output frames can also be processed by any
frame processor in the pipeline and they are allowed to be sent by the output
transport.

- `ParallelTask` has been renamed to `SyncParallelPipeline`. A
`SyncParallelPipeline` is a frame processor that contains a list of different
pipelines to be executed concurrently. The difference between a
`SyncParallelPipeline` and a `ParallelPipeline` is that, given an input frame,
the `SyncParallelPipeline` will wait for all the internal pipelines to
complete. This is achieved by making sure the last processor in each of the
pipelines is synchronous (e.g. an HTTP-based service that waits for the
response).

- `StartFrame` is back a system frame to make sure it's processed immediately by
all processors. `EndFrame` stays a control frame since it needs to be ordered
allowing the frames in the pipeline to be processed.

- Updated `MoondreamService` revision to `2024-08-26`.

- `CartesiaTTSService` and `ElevenLabsTTSService` now add presentation
timestamps to their text output. This allows the output transport to push the
text frames downstream at almost the same time the words are spoken. We say
"almost" because currently the audio frames don't have presentation timestamp
but they should be played at roughly the same time.

- `DailyTransport.on_joined` event now returns the full session data instead of
just the participant.

- `CartesiaTTSService` is now a subclass of `TTSService`.

- `DeepgramSTTService` is now a subclass of `STTService`.

- `WhisperSTTService` is now a subclass of `SegmentedSTTService`. A
`SegmentedSTTService` is a `STTService` where the provided audio is given in a
big chunk (i.e. from when the user starts speaking until the user stops
speaking) instead of a continous stream.

Fixed

- Fixed OpenAI multiple function calls.

- Fixed a Cartesia TTS issue that would cause audio to be truncated in some
cases.

- Fixed a `BaseOutputTransport` issue that would stop audio and video rendering
tasks (after receiving and `EndFrame`) before the internal queue was emptied,
causing the pipeline to finish prematurely.

- `StartFrame` should be the first frame every processor receives to avoid
situations where things are not initialized (because initialization happens on
`StartFrame`) and other frames come in resulting in undesired behavior.

Performance

- `obj_id()` and `obj_count()` now use `itertools.count` avoiding the need of
`threading.Lock`.

Other

- Pipecat now uses Ruff as its formatter (https://github.com/astral-sh/ruff).

0.0.41

Added

- Added `LivekitFrameSerializer` audio frame serializer.

Fixed

- Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass.

- Fix an `AnthropicLLMService` issue with empty arguments in function calling.

Other

- Fixed `studypal` example errors.

0.0.40

Added

- VAD parameters can now be dynamicallt updated using the
`VADParamsUpdateFrame`.

- `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a
fatal error is pushed upstream (false by default). A new `FatalErrorFrame`
that sets this flag to true has been added.

- `AnthropicLLMService` now supports function calling and initial support for
prompt caching.
(see https://www.anthropic.com/news/prompt-caching)

- `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as
`output_format`.

- `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample
rates to use.

- Added new `on_participant_updated` event to `DailyTransport`.

- Added `DailyRESTHelper.delete_room_by_name()` and
`DailyRESTHelper.delete_room_by_url()`.

- Added LLM and TTS usage metrics. Those are enabled when
`PipelineParams.enable_usage_metrics` is True.

- `AudioRawFrame`s are now pushed downstream from the base output
transport. This allows capturing the exact words the bot says by adding an STT
service at the end of the pipeline.

- Added new `GStreamerPipelineSource`. This processor can generate image or
audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP
stream or anything supported by GStreamer).

- Added `TransportParams.audio_out_is_live`. This flag is False by default and
it is useful to indicate we should not synchronize audio with sporadic images.

- Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control
frames. These frames are pushed upstream and they should wrap
`BotSpeakingFrame`.

- Transports now allow you to register event handlers without decorators.

Changed

- Support RTVI message protocol 0.1. This includes new messages, support for
messages responses, support for actions, configuration, webhooks and a bunch
of new cool stuff.
(see https://docs.rtvi.ai/)

- `SileroVAD` dependency is now imported via pip's `silero-vad` package.

- `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default.

- `BotSpeakingFrame` is now a control frame.

- `StartFrame` is now a control frame similar to `EndFrame`.

- `DeepgramTTSService` now is more customizable. You can adjust the encoding and
sample rate.

Fixed

- `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and
stops. This allows for knowing when the bot starts and stops speaking even
with asynchronous services (like Cartesia).

- Fixed `AzureSTTService` transcription frame timestamps.

- Fixed an issue with `DailyRESTHelper.create_room()` expirations which would
cause this function to stop working after the initial expiration elapsed.

- Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things
gracefully while a `CancelFrame` should cancel all running tasks as soon as
possible.

- Fixed an issue in `AIService` that would cause a yielded `None` value to be
processed.

- RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and
a first participant joins.

- Fixed a `BaseInputTransport` issue that was causing incoming system frames to
be queued instead of being pushed immediately.

- Fixed a `BaseInputTransport` issue that was causing start/stop interruptions
incoming frames to not cancel tasks and be processed properly.

Other

- Added `studypal` example (from to the Cartesia folks!).

- Most examples now use Cartesia.

- Added examples `foundational/19a-tools-anthropic.py`,
`foundational/19b-tools-video-anthropic.py` and
`foundational/19a-tools-togetherai.py`.

- Added examples `foundational/18-gstreamer-filesrc.py` and
`foundational/18a-gstreamer-videotestsrc.py` that show how to use
`GStreamerPipelineSource`

- Remove `requests` library usage.

- Cleanup examples and use `DailyRESTHelper`.

0.0.39

Fixed

- Fixed a regression introduced in 0.0.38 that would cause Daily transcription
to stop the Pipeline.

Page 4 of 11

Releases

Has known vulnerabilities

Previous Next

Pipecat-ai

Page 4 of 11

0.0.44

0.0.43

0.0.42

0.0.41

0.0.40

0.0.39

Page 4 of 11

Links

Releases