Optimizations:
- Refactors `StreamingConversation` as a pipeline of consumer-producer workers - now transcription / agent response / synthesis are decoupled into their own async processes. Shoutout to jnak for helping us out with the refactor. Upshots:
- The LLM call no longer blocks the processing of new transcripts
- Playing the output audio runs concurrently with both generating the responses and synthesizing audio, so while each sentence is being played, the next response is being generated and synthesized - for synthesizer with latencies > 1s, there is no longer a delay between each sentence of a response.
- Resource management: synthesizers no longer need a dedicated thread, so e.g. a single telephony server can now support double the number of concurrent phone calls
Contribution / Code cleanliness:
- Simple tests that assert `StreamingConversation` works across all supported Python versions: run this locally with `make test`
- Typechecking with `mypy`: run this locally with `make typecheck`
Features:
- ElevenLabs `optimize_streaming_latency` parameter
- Adds the Twilio `to` and `from` numbers to the `CallConfig` in the `ConfigManager` (h/t Nikhil-Kulkarni)
- AssemblyAI buffering (solves https://github.com/vocodedev/vocode-react-sdk/issues/6) (h/t m-ods)
- Option to record Twilio calls (h/t shahafabileah)
- Adds `mute_during_speech` parameter to Transcribers as a solution to speaker feedback into microphone: see note in https://github.com/vocodedev/vocode-python/issues/16