Pipecat-ai

Latest version: v0.0.41

Safety actively analyzes 666166 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

0.0.41

Added

- Added `LivekitFrameSerializer` audio frame serializer.

Fixed

- Fix `FastAPIWebsocketOutputTransport` variable name clash with subclass.

- Fix an `AnthropicLLMService` issue with empty arguments in function calling.

Other

- Fixed `studypal` example errors.

0.0.40

Added

- VAD parameters can now be dynamicallt updated using the
`VADParamsUpdateFrame`.

- `ErrorFrame` has now a `fatal` field to indicate the bot should exit if a
fatal error is pushed upstream (false by default). A new `FatalErrorFrame`
that sets this flag to true has been added.

- `AnthropicLLMService` now supports function calling and initial support for
prompt caching.
(see https://www.anthropic.com/news/prompt-caching)

- `ElevenLabsTTSService` can now specify ElevenLabs input parameters such as
`output_format`.

- `TwilioFrameSerializer` can now specify Twilio's and Pipecat's desired sample
rates to use.

- Added new `on_participant_updated` event to `DailyTransport`.

- Added `DailyRESTHelper.delete_room_by_name()` and
`DailyRESTHelper.delete_room_by_url()`.

- Added LLM and TTS usage metrics. Those are enabled when
`PipelineParams.enable_usage_metrics` is True.

- `AudioRawFrame`s are now pushed downstream from the base output
transport. This allows capturing the exact words the bot says by adding an STT
service at the end of the pipeline.

- Added new `GStreamerPipelineSource`. This processor can generate image or
audio frames from a GStreamer pipeline (e.g. reading an MP4 file, and RTP
stream or anything supported by GStreamer).

- Added `TransportParams.audio_out_is_live`. This flag is False by default and
it is useful to indicate we should not synchronize audio with sporadic images.

- Added new `BotStartedSpeakingFrame` and `BotStoppedSpeakingFrame` control
frames. These frames are pushed upstream and they should wrap
`BotSpeakingFrame`.

- Transports now allow you to register event handlers without decorators.

Changed

- Support RTVI message protocol 0.1. This includes new messages, support for
messages responses, support for actions, configuration, webhooks and a bunch
of new cool stuff.
(see https://docs.rtvi.ai/)

- `SileroVAD` dependency is now imported via pip's `silero-vad` package.

- `ElevenLabsTTSService` now uses `eleven_turbo_v2_5` model by default.

- `BotSpeakingFrame` is now a control frame.

- `StartFrame` is now a control frame similar to `EndFrame`.

- `DeepgramTTSService` now is more customizable. You can adjust the encoding and
sample rate.

Fixed

- `TTSStartFrame` and `TTSStopFrame` are now sent when TTS really starts and
stops. This allows for knowing when the bot starts and stops speaking even
with asynchronous services (like Cartesia).

- Fixed `AzureSTTService` transcription frame timestamps.

- Fixed an issue with `DailyRESTHelper.create_room()` expirations which would
cause this function to stop working after the initial expiration elapsed.

- Improved `EndFrame` and `CancelFrame` handling. `EndFrame` should end things
gracefully while a `CancelFrame` should cancel all running tasks as soon as
possible.

- Fixed an issue in `AIService` that would cause a yielded `None` value to be
processed.

- RTVI's `bot-ready` message is now sent when the RTVI pipeline is ready and
a first participant joins.

- Fixed a `BaseInputTransport` issue that was causing incoming system frames to
be queued instead of being pushed immediately.

- Fixed a `BaseInputTransport` issue that was causing start/stop interruptions
incoming frames to not cancel tasks and be processed properly.

Other

- Added `studypal` example (from to the Cartesia folks!).

- Most examples now use Cartesia.

- Added examples `foundational/19a-tools-anthropic.py`,
`foundational/19b-tools-video-anthropic.py` and
`foundational/19a-tools-togetherai.py`.

- Added examples `foundational/18-gstreamer-filesrc.py` and
`foundational/18a-gstreamer-videotestsrc.py` that show how to use
`GStreamerPipelineSource`

- Remove `requests` library usage.

- Cleanup examples and use `DailyRESTHelper`.

0.0.39

Fixed

- Fixed a regression introduced in 0.0.38 that would cause Daily transcription
to stop the Pipeline.

0.0.38

Added

- Added `force_reload`, `skip_validation` and `trust_repo` to `SileroVAD` and
`SileroVADAnalyzer`. This allows caching and various GitHub repo validations.

- Added `send_initial_empty_metrics` flag to `PipelineParams` to request for
initial empty metrics (zero values). True by default.

Fixed

- Fixed initial metrics format. It was using the wrong keys name/time instead of
processor/value.

- STT services should be using ISO 8601 time format for transcription frames.

- Fixed an issue that would cause Daily transport to show a stop transcription
error when actually none occurred.

0.0.37

Added

- Added `RTVIProcessor` which implements the RTVI-AI standard.
See https://github.com/rtvi-ai

- Added `BotInterruptionFrame` which allows interrupting the bot while talking.

- Added `LLMMessagesAppendFrame` which allows appending messages to the current
LLM context.

- Added `LLMMessagesUpdateFrame` which allows changing the LLM context for the
one provided in this new frame.

- Added `LLMModelUpdateFrame` which allows updating the LLM model.

- Added `TTSSpeakFrame` which causes the bot say some text. This text will not
be part of the LLM context.

- Added `TTSVoiceUpdateFrame` which allows updating the TTS voice.

Removed

- We remove the `LLMResponseStartFrame` and `LLMResponseEndFrame` frames. These
were added in the past to properly handle interruptions for the
`LLMAssistantContextAggregator`. But the `LLMContextAggregator` is now based
on `LLMResponseAggregator` which handles interruptions properly by just
processing the `StartInterruptionFrame`, so there's no need for these extra
frames any more.

Fixed

- Fixed an issue with `StatelessTextTransformer` where it was pushing a string
instead of a `TextFrame`.

- `TTSService` end of sentence detection has been improved. It now works with
acronyms, numbers, hours and others.

- Fixed an issue in `TTSService` that would not properly flush the current
aggregated sentence if an `LLMFullResponseEndFrame` was found.

Performance

- `CartesiaTTSService` now uses websockets which improves speed. It also
leverages the new Cartesia contexts which maintains generated audio prosody
when multiple inputs are sent, therefore improving audio quality a lot.

0.0.36

Added

- Added `GladiaSTTService`.
See https://docs.gladia.io/chapters/speech-to-text-api/pages/live-speech-recognition

- Added `XTTSService`. This is a local Text-To-Speech service.
See https://github.com/coqui-ai/TTS

- Added `UserIdleProcessor`. This processor can be used to wait for any
interaction with the user. If the user doesn't say anything within a given
timeout a provided callback is called.

- Added `IdleFrameProcessor`. This processor can be used to wait for frames
within a given timeout. If no frame is received within the timeout a provided
callback is called.

- Added new frame `BotSpeakingFrame`. This frame will be continuously pushed
upstream while the bot is talking.

- It is now possible to specify a Silero VAD version when using `SileroVADAnalyzer`
or `SileroVAD`.

- Added `AysncFrameProcessor` and `AsyncAIService`. Some services like
`DeepgramSTTService` need to process things asynchronously. For example, audio
is sent to Deepgram but transcriptions are not returned immediately. In these
cases we still require all frames (except system frames) to be pushed
downstream from a single task. That's what `AsyncFrameProcessor` is for. It
creates a task and all frames should be pushed from that task. So, whenever a
new Deepgram transcription is ready that transcription will also be pushed
from this internal task.

- The `MetricsFrame` now includes processing metrics if metrics are enabled. The
processing metrics indicate the time a processor needs to generate all its
output. Note that not all processors generate these kind of metrics.

Changed

- `WhisperSTTService` model can now also be a string.

- Added missing * keyword separators in services.

Fixed

- `WebsocketServerTransport` doesn't try to send frames anymore if serializers
returns `None`.

- Fixed an issue where exceptions that occurred inside frame processors were
being swallowed and not displayed.

- Fixed an issue in `FastAPIWebsocketTransport` where it would still try to send
data to the websocket after being closed.

Other

- Added Fly.io deployment example in `examples/deployment/flyio-example`.

- Added new `17-detect-user-idle.py` example that shows how to use the new
`UserIdleProcessor`.

Page 1 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.