Pipecat-ai

Latest version: v0.0.62

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 11

0.0.56

Changed

- Use `gemini-2.0-flash-001` as the default model for `GoogleLLMSerivce`.

- Improved foundational examples 22b, 22c, and 22d to support function calling.
With these base examples, `FunctionCallInProgressFrame` and
`FunctionCallResultFrame` will no longer be blocked by the gates.

Fixed

- Fixed a `TkLocalTransport` and `LocalAudioTransport` issues that was causing
errors on cleanup.

- Fixed an issue that was causing `tests.utils` import to fail because of
logging setup.

- Fixed a `SentryMetrics` issue that was preventing any metrics to be sent to
Sentry and also was preventing from metrics frames to be pushed to the
pipeline.

- Fixed an issue in `BaseOutputTransport` where incoming audio would not be
resampled to the desired output sample rate.

- Fixed an issue with the `TwilioFrameSerializer` and `TelnyxFrameSerializer`
where `twilio_sample_rate` and `telnyx_sample_rate` were incorrectly
initialized to `audio_in_sample_rate`. Those values currently default to 8000
and should be set manually from the serializer constructor if a different
value is needed.

Other

- Added a new `sentry-metrics` example.

0.0.55

Added

- Added a new `start_metadata` field to `PipelineParams`. The provided metadata
will be set to the initial `StartFrame` being pushed from the `PipelineTask`.

- Added new fields to `PipelineParams` to control audio input and output sample
rates for the whole pipeline. This allows controlling sample rates from a
single place instead of having to specify sample rates in each
service. Setting a sample rate to a service is still possible and will
override the value from `PipelineParams`.

- Introduce audio resamplers (`BaseAudioResampler`). This is just a base class
to implement audio resamplers. Currently, two implementations are provided
`SOXRAudioResampler` and `ResampyResampler`. A new
`create_default_resampler()` has been added (replacing the now deprecated
`resample_audio()`).

- It is now possible to specify the asyncio event loop that a `PipelineTask` and
all the processors should run on by passing it as a new argument to the
`PipelineRunner`. This could allow running pipelines in multiple threads each
one with its own event loop.

- Added a new `utils.TaskManager`. Instead of a global task manager we now have
a task manager per `PipelineTask`. In the previous version the task manager
was global, so running multiple simultaneous `PipelineTask`s could result in
dangling task warnings which were not actually true. In order, for all the
processors to know about the task manager, we pass it through the
`StartFrame`. This means that processors should create tasks when they receive
a `StartFrame` but not before (because they don't have a task manager yet).

- Added `TelnyxFrameSerializer` to support Telnyx calls. A full running example
has also been added to `examples/telnyx-chatbot`.

- Allow pushing silence audio frames before `TTSStoppedFrame`. This might be
useful for testing purposes, for example, passing bot audio to an STT service
which usually needs additional audio data to detect the utterance stopped.

- `TwilioSerializer` now supports transport message frames. With this we can
create Twilio emulators.

- Added a new transport: `WebsocketClientTransport`.

- Added a `metadata` field to `Frame` which makes it possible to pass custom
data to all frames.

- Added `test/utils.py` inside of pipecat package.

Changed

- `GatedOpenAILLMContextAggregator` now require keyword arguments. Also, a new
`start_open` argument has been added to set the initial state of the gate.

- Added `organization` and `project` level authentication to
`OpenAILLMService`.

- Improved the language checking logic in `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService` to properly handle language codes based on model
compatibility, with appropriate warnings when language codes cannot be
applied.

- Updated `GoogleLLMContext` to support pushing `LLMMessagesUpdateFrame`s that
contain a combination of function calls, function call responses, system
messages, or just messages.

- `InputDTMFFrame` is now based on `DTMFFrame`. There's also a new
`OutputDTMFFrame` frame.

Deprecated

- `resample_audio()` is now deprecated, use `create_default_resampler()`
instead.

Removed

- `AudioBufferProcessor.reset_audio_buffers()` has been removed, use
`AudioBufferProcessor.start_recording()` and
`AudioBufferProcessor.stop_recording()` instead.

Fixed

- Fixed a `AudioBufferProcessor` that would cause crackling in some recordings.

- Fixed an issue in `AudioBufferProcessor` where user callback would not be
called on task cancellation.

- Fixed an issue in `AudioBufferProcessor` that would cause wrong silence
padding in some cases.

- Fixed an issue where `ElevenLabsTTSService` messages would return a 1009
websocket error by increasing the max message size limit to 16MB.

- Fixed a `DailyTransport` issue that would cause events to be triggered before
join finished.

- Fixed a `PipelineTask` issue that was preventing processors to be cleaned up
after cancelling the task.

- Fixed an issue where queuing a `CancelFrame` to a pipeline task would not
cause the task to finish. However, using `PipelineTask.cancel()` is still the
recommended way to cancel a task.

Other

- Improved Unit Test `run_test()` to use `PipelineTask` and
`PipelineRunner`. There's now also some control around `StartFrame` and
`EndFrame`. The `EndTaskFrame` has been removed since it doesn't seem
necessary with this new approach.

- Updated `twilio-chatbot` with a few new features: use 8000 sample rate and
avoid resampling, a new client useful for stress testing and testing locally
without the need to make phone calls. Also, added audio recording on both the
client and the server to make sure the audio sounds good.

- Updated examples to use `task.cancel()` to immediately exit the example when a
participant leaves or disconnects, instead of pushing an `EndFrame`. Pushing
an `EndFrame` causes the bot to run through everything that is internally
queued (which could take some seconds). Note that using `task.cancel()` might
not always be the best option and pushing an `EndFrame` could still be
desirable to make sure all the pipeline is flushed.

0.0.54

Added

- In order to create tasks in Pipecat frame processors it is now recommended to
use `FrameProcessor.create_task()` (which uses the new
`utils.asyncio.create_task()`). It takes care of uncaught exceptions, task
cancellation handling and task management. To cancel or wait for a task there
is `FrameProcessor.cancel_task()` and `FrameProcessor.wait_for_task()`. All of
Pipecat processors have been updated accordingly. Also, when a pipeline runner
finishes, a warning about dangling tasks might appear, which indicates if any
of the created tasks was never cancelled or awaited for (using these new
functions).

- It is now possible to specify the period of the `PipelineTask` heartbeat
frames with `heartbeats_period_secs`.

- Added `DailyMeetingTokenProperties` and `DailyMeetingTokenParams` Pydantic models
for meeting token creation in `get_token` method of `DailyRESTHelper`.

- Added `enable_recording` and `geo` parameters to `DailyRoomProperties`.

- Added `RecordingsBucketConfig` to `DailyRoomProperties` to upload recordings
to a custom AWS bucket.

Changed

- Enhanced `UserIdleProcessor` with retry functionality and control over idle
monitoring via new callback signature `(processor, retry_count) -> bool`.
Updated the `17-detect-user-idle.py` to show how to use the `retry_count`.

- Add defensive error handling for `OpenAIRealtimeBetaLLMService`'s audio
truncation. Audio truncation errors during interruptions now log a warning
and allow the session to continue instead of throwing an exception.

- Modified `TranscriptProcessor` to use TTS text frames for more accurate assistant
transcripts. Assistant messages are now aggregated based on bot speaking boundaries
rather than LLM context, providing better handling of interruptions and partial
utterances.

- Updated foundational examples `28a-transcription-processor-openai.py`,
`28b-transcript-processor-anthropic.py`, and
`28c-transcription-processor-gemini.py` to use the updated
`TranscriptProcessor`.

Fixed

- Fixed an `GeminiMultimodalLiveLLMService` issue that was preventing the user
to push initial LLM assistant messages (using `LLMMessagesAppendFrame`).

- Added missing `FrameProcessor.cleanup()` calls to `Pipeline`,
`ParallelPipeline` and `UserIdleProcessor`.

- Fixed a type error when using `voice_settings` in `ElevenLabsHttpTTSService`.

- Fixed an issue where `OpenAIRealtimeBetaLLMService` function calling resulted
in an error.

- Fixed an issue in `AudioBufferProcessor` where the last audio buffer was not
being processed, in cases where the `_user_audio_buffer` was smaller than the
buffer size.

Performance

- Replaced audio resampling library `resampy` with `soxr`. Resampling a 2:21s
audio file from 24KHz to 16KHz took 1.41s with `resampy` and 0.031s with
`soxr` with similar audio quality.

Other

- Added initial unit test infrastructure.

0.0.53

Added

- Added `ElevenLabsHttpTTSService` which uses EleveLabs' HTTP API instead of the
websocket one.

- Introduced pipeline frame observers. Observers can view all the frames that go
through the pipeline without the need to inject processors in the
pipeline. This can be useful, for example, to implement frame loggers or
debuggers among other things. The example
`examples/foundational/30-observer.py` shows how to add an observer to a
pipeline for debugging.

- Introduced heartbeat frames. The pipeline task can now push periodic
heartbeats down the pipeline when `enable_heartbeats=True`. Heartbeats are
system frames that are supposed to make it all the way to the end of the
pipeline. When a heartbeat frame is received the traversing time (i.e. the
time it took to go through the whole pipeline) will be displayed (with TRACE
logging) otherwise a warning will be shown. The example
`examples/foundational/31-heartbeats.py` shows how to enable heartbeats and
forces warnings to be displayed.

- Added `LLMTextFrame` and `TTSTextFrame` which should be pushed by LLM and TTS
services respectively instead of `TextFrame`s.

- Added `OpenRouter` for OpenRouter integration with an OpenAI-compatible
interface. Added foundational example `14m-function-calling-openrouter.py`.

- Added a new `WebsocketService` based class for TTS services, containing
base functions and retry logic.

- Added `DeepSeekLLMService` for DeepSeek integration with an OpenAI-compatible
interface. Added foundational example `14l-function-calling-deepseek.py`.

- Added `FunctionCallResultProperties` dataclass to provide a structured way to
control function call behavior, including:

- `run_llm`: Controls whether to trigger LLM completion
- `on_context_updated`: Optional callback triggered after context update

- Added a new foundational example `07e-interruptible-playht-http.py` for easy
testing of `PlayHTHttpTTSService`.

- Added support for Google TTS Journey voices in `GoogleTTSService`.

- Added `29-livekit-audio-chat.py`, as a new foundational examples for
`LiveKitTransportLayer`.

- Added `enable_prejoin_ui`, `max_participants` and `start_video_off` params
to `DailyRoomProperties`.

- Added `session_timeout` to `FastAPIWebsocketTransport` and
`WebsocketServerTransport` for configuring session timeouts (in
seconds). Triggers `on_session_timeout` for custom timeout handling.
See [examples/websocket-server/bot.py](https://github.com/pipecat-ai/pipecat/blob/main/examples/websocket-server/bot.py).

- Added the new modalities option and helper function to set Gemini output
modalities.

- Added `examples/foundational/26d-gemini-multimodal-live-text.py` which is
using Gemini as TEXT modality and using another TTS provider for TTS process.

Changed

- Modified `UserIdleProcessor` to start monitoring only after first
conversation activity (`UserStartedSpeakingFrame` or
`BotStartedSpeakingFrame`) instead of immediately.

- Modified `OpenAIAssistantContextAggregator` to support controlled completions
and to emit context update callbacks via `FunctionCallResultProperties`.

- Added `aws_session_token` to the `PollyTTSService`.

- Changed the default model for `PlayHTHttpTTSService` to `Play3.0-mini-http`.

- `api_key`, `aws_access_key_id` and `region` are no longer required parameters
for the PollyTTSService (AWSTTSService)

- Added `session_timeout` example in `examples/websocket-server/bot.py` to
handle session timeout event.

- Changed `InputParams` in
`src/pipecat/services/gemini_multimodal_live/gemini.py` to support different
modalities.

- Changed `DeepgramSTTService` to send `finalize` event whenever VAD detects
`UserStoppedSpeakingFrame`. This helps in faster transcriptions and clearing
the `Deepgram` audio buffer.

Fixed

- Fixed an issue where `DeepgramSTTService` was not generating metrics using
pipeline's VAD.

- Fixed `UserIdleProcessor` not properly propagating `EndFrame`s through the
pipeline.

- Fixed an issue where websocket based TTS services could incorrectly terminate
their connection due to a retry counter not resetting.

- Fixed a `PipelineTask` issue that would cause a dangling task after stopping
the pipeline with an `EndFrame`.

- Fixed an import issue for `PlayHTHttpTTSService`.

- Fixed an issue where languages couldn't be used with the `PlayHTHttpTTSService`.

- Fixed an issue where `OpenAIRealtimeBetaLLMService` audio chunks were hitting
an error when truncating audio content.

- Fixed an issue where setting the voice and model for `RimeHttpTTSService`
wasn't working.

- Fixed an issue where `IdleFrameProcessor` and `UserIdleProcessor` were getting
initialized before the start of the pipeline.

0.0.52

Added

- Constructor arguments for GoogleLLMService to directly set tools and tool_config.

- Smart turn detection example (`22d-natural-conversation-gemini-audio.py`) that
leverages Gemini 2.0 capabilities ().
(see https://x.com/kwindla/status/1870974144831275410)

- Added `DailyTransport.send_dtmf()` to send dial-out DTMF tones.

- Added `DailyTransport.sip_call_transfer()` to forward SIP and PSTN calls to
another address or number. For example, transfer a SIP call to a different
SIP address or transfer a PSTN phone number to a different PSTN phone number.

- Added `DailyTransport.sip_refer()` to transfer incoming SIP/PSTN calls from
outside Daily to another SIP/PSTN address.

- Added an `auto_mode` input parameter to `ElevenLabsTTSService`. `auto_mode`
is set to `True` by default. Enabling this setting disables the chunk
schedule and all buffers, which reduces latency.

- Added `KoalaFilter` which implement on device noise reduction using Koala
Noise Suppression.
(see https://picovoice.ai/platform/koala/)

- Added `CerebrasLLMService` for Cerebras integration with an OpenAI-compatible
interface. Added foundational example `14k-function-calling-cerebras.py`.

- Pipecat now supports Python 3.13. We had a dependency on the `audioop` package
which was deprecated and now removed on Python 3.13. We are now using
`audioop-lts` (https://github.com/AbstractUmbra/audioop) to provide the same
functionality.

- Added timestamped conversation transcript support:

- New `TranscriptProcessor` factory provides access to user and assistant
transcript processors.
- `UserTranscriptProcessor` processes user speech with timestamps from
transcription.
- `AssistantTranscriptProcessor` processes assistant responses with LLM
context timestamps.
- Messages emitted with ISO 8601 timestamps indicating when they were spoken.
- Supports all LLM formats (OpenAI, Anthropic, Google) via standard message
format.
- New examples: `28a-transcription-processor-openai.py`,
`28b-transcription-processor-anthropic.py`, and
`28c-transcription-processor-gemini.py`.

- Add support for more languages to ElevenLabs (Arabic, Croatian, Filipino,
Tamil) and PlayHT (Afrikans, Albanian, Amharic, Arabic, Bengali, Croatian,
Galician, Hebrew, Mandarin, Serbian, Tagalog, Urdu, Xhosa).

Changed

- `PlayHTTTSService` uses the new v4 websocket API, which also fixes an issue
where text inputted to the TTS didn't return audio.

- The default model for `ElevenLabsTTSService` is now `eleven_flash_v2_5`.

- `OpenAIRealtimeBetaLLMService` now takes a `model` parameter in the
constructor.

- Updated the default model for the `OpenAIRealtimeBetaLLMService`.

- Room expiration (`exp`) in `DailyRoomProperties` is now optional (`None`) by
default instead of automatically setting a 5-minute expiration time. You must
explicitly set expiration time if desired.

Deprecated

- `AWSTTSService` is now deprecated, use `PollyTTSService` instead.

Fixed

- Fixed token counting in `GoogleLLMService`. Tokens were summed incorrectly
(double-counted in many cases).

- Fixed an issue that could cause the bot to stop talking if there was a user
interruption before getting any audio from the TTS service.

- Fixed an issue that would cause `ParallelPipeline` to handle `EndFrame`
incorrectly causing the main pipeline to not terminate or terminate too early.

- Fixed an audio stuttering issue in `FastPitchTTSService`.

- Fixed a `BaseOutputTransport` issue that was causing non-audio frames being
processed before the previous audio frames were played. This will allow, for
example, sending a frame `A` after a `TTSSpeakFrame` and the frame `A` will
only be pushed downstream after the audio generated from `TTSSpeakFrame` has
been spoken.

- Fixed a `DeepgramSTTService` issue that was causing language to be passed as
an object instead of a string resulting in the connection to fail.

0.0.51

Fixed

- Fixed an issue in websocket-based TTS services that was causing infinite
reconnections (Cartesia, ElevenLabs, PlayHT and LMNT).

Page 2 of 11

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.