Added
- When registering a function call it is now possible to indicate if you want
the function call to be cancelled if there's a user interruption via
`cancel_on_interruption` (defaults to False). This is now possible because
function calls are executed concurrently.
- Added support for detecting idle pipelines. By default, if no activity has
been detected during 5 minutes, the `PipelineTask` will be automatically
cancelled. It is possible to override this behavior by passing
`cancel_on_idle_timeout=False`. It is also possible to change the default
timeout with `idle_timeout_secs` or the frames that prevent the pipeline from
being idle with `idle_timeout_frames`. Finally, an `on_idle_timeout` event
handler will be triggered if the idle timeout is reached (whether the pipeline
task is cancelled or not).
- Added `FalSTTService`, which provides STT for Fal's Wizper API.
- Added a `reconnect_on_error` parameter to websocket-based TTS services as well
as a `on_connection_error` event handler. The `reconnect_on_error` indicates
whether the TTS service should reconnect on error. The `on_connection_error`
will always get called if there's any error no matter the value of
`reconnect_on_error`. This allows, for example, to fallback to a different TTS
provider if something goes wrong with the current one.
- Added new `SkipTagsAggregator` that extends `BaseTextAggregator` to aggregate
text and skips end of sentence matching if aggregated text is between
start/end tags.
- Added new `PatternPairAggregator` that extends `BaseTextAggregator` to
identify content between matching pattern pairs in streamed text. This allows
for detection and processing of structured content like XML-style tags that
may span across multiple text chunks or sentence boundaries.
- Added new `BaseTextAggregator`. Text aggregators are used by the TTS service
to aggregate LLM tokens and decide when the aggregated text should be pushed
to the TTS service. They also allow for the text to be manipulated while it's
being aggregated. A text aggregator can be passed via `text_aggregator` to the
TTS service.
- Added new `sample_rate` constructor parameter to `TavusVideoService` to allow
changing the output sample rate.
- Added new `NeuphonicTTSService`.
(see https://neuphonic.com)
- Added new `UltravoxSTTService`.
(see https://github.com/fixie-ai/ultravox)
- Added `on_frame_reached_upstream` and `on_frame_reached_downstream` event
handlers to `PipelineTask`. Those events will be called when a frame reaches
the beginning or end of the pipeline respectively. Note that by default, the
event handlers will not be called unless a filter is set with
`PipelineTask.set_reached_upstream_filter()` or
`PipelineTask.set_reached_downstream_filter()`.
- Added support for Chirp voices in `GoogleTTSService`.
- Added a `flush_audio()` method to `FishTTSService` and `LmntTTSService`.
- Added a `set_language` convenience method for `GoogleSTTService`, allowing
you to set a single language. This is in addition to the `set_languages`
method which allows you to set a list of languages.
- Added `on_user_turn_audio_data` and `on_bot_turn_audio_data` to
`AudioBufferProcessor`. This gives the ability to grab the audio of only that
turn for both the user and the bot.
- Added new base class `BaseObject` which is now the base class of
`FrameProcessor`, `PipelineRunner`, `PipelineTask` and `BaseTransport`. The
new `BaseObject` adds supports for event handlers.
- Added support for a unified format for specifying function calling across all
LLM services.
python
weather_function = FunctionSchema(
name="get_current_weather",
description="Get the current weather",
properties={
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the user's location.",
},
},
required=["location"],
)
tools = ToolsSchema(standard_tools=[weather_function])
- Added `speech_threshold` parameter to `GladiaSTTService`.
- Allow passing user (`user_kwargs`) and assistant (`assistant_kwargs`) context
aggregator parameters when using `create_context_aggregator()`. The values are
passed as a mapping that will then be converted to arguments.
- Added `speed` as an `InputParam` for both `ElevenLabsTTSService` and
`ElevenLabsHttpTTSService`.
- Added new `LLMFullResponseAggregator` to aggregate full LLM completions. At
every completion the `on_completion` event handler is triggered.
- Added a new frame, `RTVIServerMessageFrame`, and RTVI message
`RTVIServerMessage` which provides a generic mechanism for sending custom
messages from server to client. The `RTVIServerMessageFrame` is processed by
the `RTVIObserver` and will be delivered to the client's `onServerMessage`
callback or `ServerMessage` event.
- Added `GoogleLLMOpenAIBetaService` for Google LLM integration with an
OpenAI-compatible interface. Added foundational example
`14o-function-calling-gemini-openai-format.py`.
- Added `AzureRealtimeBetaLLMService` to support Azure's OpeanAI Realtime API. Added
foundational example `19a-azure-realtime-beta.py`.
- Introduced `GoogleVertexLLMService`, a new class for integrating with Vertex AI
Gemini models. Added foundational example
`14p-function-calling-gemini-vertex-ai.py`.
- Added support in `OpenAIRealtimeBetaLLMService` for a slate of new features:
- The `'gpt-4o-transcribe'` input audio transcription model, along
with new `language` and `prompt` options specific to that model.
- The `input_audio_noise_reduction` session property.
python
session_properties = SessionProperties(
...
input_audio_noise_reduction=InputAudioNoiseReduction(
type="near_field" also supported: "far_field"
)
...
)
- The `'semantic_vad'` `turn_detection` session property value, a more
sophisticated model for detecting when the user has stopped speaking.
- `on_conversation_item_created` and `on_conversation_item_updated`
events to `OpenAIRealtimeBetaLLMService`.
python
llm.event_handler("on_conversation_item_created")
async def on_conversation_item_created(llm, item_id, item):
...
llm.event_handler("on_conversation_item_updated")
async def on_conversation_item_updated(llm, item_id, item):
`item` may not always be available here
...
- The `retrieve_conversation_item(item_id)` method for introspecting a
conversation item on the server.
python
item = await llm.retrieve_conversation_item(item_id)
Changed
- Updated `OpenAISTTService` to use `gpt-4o-transcribe` as the default
transcription model.
- Updated `OpenAITTSService` to use `gpt-4o-mini-tts` as the default TTS model.
- Function calls are now executed in tasks. This means that the pipeline will
not be blocked while the function call is being executed.
- ⚠️ `PipelineTask` will now be automatically cancelled if no bot activity is
happening in the pipeline. There are a few settings to configure this
behavior, see `PipelineTask` documentation for more details.
- All event handlers are now executed in separate tasks in order to prevent
blocking the pipeline. It is possible that event handlers take some time to
execute in which case the pipeline would be blocked waiting for the event
handler to complete.
- Updated `TranscriptProcessor` to support text output from
`OpenAIRealtimeBetaLLMService`.
- `OpenAIRealtimeBetaLLMService` and `GeminiMultimodalLiveLLMService` now push
a `TTSTextFrame`.
- Updated the default mode for `CartesiaTTSService` and
`CartesiaHttpTTSService` to `sonic-2`.
Deprecated
- Passing a `start_callback` to `LLMService.register_function()` is now
deprecated, simply move the code from the start callback to the function call.
- `TTSService` parameter `text_filter` is now deprecated, use `text_filters`
instead which is now a list. This allows passing multiple filters that will be
executed in order.
Removed
- Removed deprecated `audio.resample_audio()`, use `create_default_resampler()`
instead.
- Removed deprecated`stt_service` parameter from `STTMuteFilter`.
- Removed deprecated RTVI processors, use an `RTVIObserver` instead.
- Removed deprecated `AWSTTSService`, use `PollyTTSService` instead.
- Removed deprecated field `tier` from `DailyTranscriptionSettings`, use `model`
instead.
- Removed deprecated `pipecat.vad` package, use `pipecat.audio.vad` instead.
Fixed
- Fixed an assistant aggregator issue that could cause assistant text to be
split into multiple chunks during function calls.
- Fixed an assistant aggregator issue that was causing assistant text to not be
added to the context during function calls. This could lead to duplications.
- Fixed a `SegmentedSTTService` issue that was causing audio to be sent
prematurely to the STT service. Instead of analyzing the volume in this
service we rely on VAD events which use both VAD and volume.
- Fixed a `GeminiMultimodalLiveLLMService` issue that was causing messages to be
duplicated in the context when pushing `LLMMessagesAppendFrame` frames.
- Fixed an issue with `SegmentedSTTService` based services
(e.g. `GroqSTTService`) that was not allow audio to pass-through downstream.
- Fixed a `CartesiaTTSService` and `RimeTTSService` issue that would consider
text between spelling out tags end of sentence.
- Fixed a `match_endofsentence` issue that would result in floating point
numbers to be considered an end of sentence.
- Fixed a `match_endofsentence` issue that would result in emails to be
considered an end of sentence.
- Fixed an issue where the RTVI message `disconnect-bot` was pushing an
`EndFrame`, resulting in the pipeline not shutting down. It now pushes an
`EndTaskFrame` upstream to shutdown the pipeline.
- Fixed an issue with the `GoogleSTTService` where stream timeouts during
periods of inactivity were causing connection failures. The service now
properly detects timeout errors and handles reconnection gracefully,
ensuring continuous operation even after periods of silence or when using an
`STTMuteFilter`.
- Fixed an issue in `RimeTTSService` where the last line of text sent didn't
result in an audio output being generated.
- Fixed `OpenAIRealtimeBetaLLMService` by adding proper handling for:
- The `conversation.item.input_audio_transcription.delta` server message,
which was added server-side at some point and not handled client-side.
- Errors reported by the `response.done` server message.
Other
- Add foundational example `07w-interruptible-fal.py`, showing `FalSTTService`.
- Added a new Ultravox example
`examples/foundational/07u-interruptible-ultravox.py`.
- Added new Neuphonic examples
`examples/foundational/07v-interruptible-neuphonic.py` and
`examples/foundational/07v-interruptible-neuphonic-http.py`.
- Added a new example `examples/foundational/36-user-email-gathering.py` to show
how to gather user emails. The example uses's Cartesia's `<spell></spell>`
tags and Rime `spell()` function to spell out the emails for confirmation.
- Update the `34-audio-recording.py` example to include an STT processor.
- Added foundational example `35-voice-switching.py` showing how to use the new
`PatternPairAggregator`. This example shows how to encode information for the
LLM to instruct TTS voice changes, but this can be used to encode any
information into the LLM response, which you want to parse and use in other
parts of your application.
- Added a Pipecat Cloud deployment example to the `examples` directory.
- Removed foundational examples 28b and 28c as the TranscriptProcessor no
longer has an LLM depedency. Renamed foundational example 28a to
`28-transcript-processor.py`.