Pipecat-ai

Latest version: v0.0.62

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 11

0.0.38

Added

- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
`SileroVADAnalyzer`. This allows caching and various GitHub repo validations.

- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
initial empty metrics (zero values). True by default.

Fixed

- Fixed initial metrics format. It was using the wrong keys name/time instead of
processor/value.

- STT services should be using ISO 8601 time format for transcription frames.

- Fixed an issue that would cause Daily transport to show a stop transcription
error when actually none occurred.

0.0.37

Added

- Added `RTVIProcessor` which implements the RTVI-AI standard.
See https://github.com/rtvi-ai

- Added `BotInterruptionFrame` which allows interrupting the bot while talking.

- Added `LLMMessagesAppendFrame` which allows appending messages to the current
LLM context.

- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
one provided in this new frame.

- Added `LLMModelUpdateFrame` which allows updating the LLM model.

- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
be part of the LLM context.

- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.

Removed

- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
were added in the past to properly handle interruptions for the
`LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
on `LLMResponseAggregator` which handles interruptions properly by just
processing the `StartInterruptionFrame`, so there's no need for these extra
frames any more.

Fixed

- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
instead of a `TextFrame`.

- `TTSService` end of sentence detection has been improved. It now works with
acronyms, numbers, hours and others.

- Fixed an issue in `TTSService` that would not properly flush the current
aggregated sentence if an `LLMFullResponseEndFrame` was found.

Performance

- `CartesiaTTSService` now uses websockets which improves speed. It also
leverages the new Cartesia contexts which maintains generated audio prosody
when multiple inputs are sent, therefore improving audio quality a lot.

0.0.36

Added

- Added `GladiaSTTService`.
See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition

- Added `XTTSService`. This is a local Text-To-Speech service.
See https://github.com/coqui-ai/TTS

- Added `UserIdleProcessor`. This processor can be used to wait for any
interaction with the user. If the user doesn't say anything within a given
timeout a provided callback is called.

- Added `IdleFrameProcessor`. This processor can be used to wait for frames
within a given timeout. If no frame is received within the timeout a provided
callback is called.

- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
upstream while the bot is talking.

- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
or `SileroVAD`.

- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
`DeepgramSTTService` need to process things asynchronously. For example, audio
is sent to Deepgram but transcriptions are not returned immediately. In these
cases we still require all frames (except system frames) to be pushed
downstream from a single task. That's what `AsyncFrameProcessor` is for. It
creates a task and all frames should be pushed from that task. So, whenever a
new Deepgram transcription is ready that transcription will also be pushed
from this internal task.

- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
processing metrics indicate the time a processor needs to generate all its
output. Note that not all processors generate these kind of metrics.

Changed

- `WhisperSTTService` model can now also be a string.

- Added missing \* keyword separators in services.

Fixed

- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
returns `None`.

- Fixed an issue where exceptions that occurred inside frame processors were
being swallowed and not displayed.

- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
data to the websocket after being closed.

Other

- Added Fly.io deployment example in `examples/deployment/flyio-example`.

- Added new `17-detect-user-idle.py` example that shows how to use the new
`UserIdleProcessor`.

0.0.35

Changed

- `FastAPIWebsocketParams` now require a serializer.

- `TwilioFrameSerializer` now requires a `streamSid`.

Fixed

- Silero VAD number of frames needs to be 512 for 16000 sample rate or 256 for
8000 sample rate.

0.0.34

Fixed

- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
interruptions to ignore transcriptions.

- Fixed an issue introduced in 0.0.33 that would cause the LLM to generate
shorter output.

0.0.33

Changed

- Upgraded to Cartesia's new Python library 1.0.0. `CartesiaTTSService` now
expects a voice ID instead of a voice name (you can get the voice ID from
Cartesia's playground). You can also specify the audio `sample_rate` and
`encoding` instead of the previous `output_format`.

Fixed

- Fixed an issue with asynchronous STT services (Deepgram and Azure) that could
cause static audio issues and interruptions to not work properly when dealing
with multiple LLMs sentences.

- Fixed an issue that could mix new LLM responses with previous ones when
handling interruptions.

- Fixed a Daily transport blocking situation that occurred while reading audio
frames after a participant left the room. Needs daily-python >= 0.10.1.

Page 5 of 11

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.