Turnvoice

Latest version: v0.0.65

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.0.65

- added --faster parameter to select faster_whisper for timestamp transcription instead of stable_whisper (stable takes lot of resources esp on longer videos)
- added --model parameter to select model for transcription. can be 'tiny', 'tiny.en', 'base', 'base.en', 'small', 'small.en', 'medium', 'medium.en', 'large-v1', 'large-v2', 'large-v3', or 'large'
- updated to Coqui TTS v0.22.0 which enables access to 58 free [predefined speaker voices](https://github.com/KoljaB/TurnVoice#coqui-engine)

0.0.60

- switched from Deezer's Spleeter to Facebook Demux, reasons:
- better vocal splitting quality
- ability to more solid handle >10 min files
- crossfade-algorithm to switch between original and vocalstripped audio more seamlessly
- usage of stable_whisper timestamp refinement technique to achieve higher timestamp detection precision
- new javascript [Renderscript-Editor](https://github.com/KoljaB/TurnVoice/blob/main/README.md#renderscript-editor) to finetune speaking timings, text and speaker assignment

![Editor](https://i.ibb.co/cYSVksS/Renderscript-Editor-small.png)

0.0.50

- added `--prepare` to write a full script including text, speakers and timestamps

bash
turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM --prepare

- added `--render` to read back such a script and generate the final video from it:

bash
turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM --render "downloads\my_video_name\full_script.txt"

- improved audio quality output

0.0.45

- added --prompt to to change speaking style

Example:

turnvoice https://www.youtube.com/watch?v=K89dChsgznw --prompt "speaking style of captain jack sparrow"

0.0.41

- using deep-translator instead of NLLB-200-600M now so we don't need the [CC-BY-NC License](https://huggingface.co/facebook/nllb-200-distilled-600M) and also don't need to download, load and unload a heavyweight translation model anymore

<sup>*(Deep-translator seems good to use for free. I think there is a way better and more general solution which I roughly have in mind. Some problems to solve yet but I guess I can make a quite significant upgrade to this in the coming days)*</sup>

0.0.40

- added Elevenlabs, Azure, OpenAI TTS and System TTS as synthesis engines to select from
- added possibility to feed a local video instead of a youtube video
- added possibility to replace multiple speaker voices at once (submit more than one voice)
- added possibility to submit own speaker timefiles (in the format of the created speaker1.txt, speaker2.txt etc timefiles) to finetune multiple speaker rendering

Page 1 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.