Turnvoice

Latest version: v0.0.65

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

0.0.30

- added lots of stuff to the algorithm:
- we unload the transcription model completely from the GPU after the first main transcription
- we then load the synthesis in a freshly cleaned VRAM and start it to take as much VRAM as it wants, because this is our bottleneck
- after the first synthesis we lazy load the transcription model AGAIN
- we can then transcript the synthesis and verify it using measuring text distance (with levenshtein and jaro winkler)
- and we can detect if the model generates hallucinations using the transcription word timestamps

So with this we have
=> a massive speed gain (x5)
=> way lower VRAM usage (because the huge transcription gets removed from VRAM, also we unload the translation model if used)
=> way more solid synthesis via verification (reducing hallucinations and strange artifacts generation by retrying synthesis)

We can now voiceturn a 20 min video on a 8GB VRAM in ~33 min

- added fades at start and end of the synthesis since it gets trimmed, so we don't clip
- autostart finished video after rendering

0.0.22

- can translate now
- cleaner cli (takes IDs and -u not needed anymore - this was my facebook "the" moment)
python
turnvoice RK91Ji6GCZ8

0.0.20

- improved sync

We now trim silence out of the synthesized audio before starting the voice speed matching algorithm. Coqui engine inserts ~0.3s (varies with speed) of silent audio at the end of the synthesis. That messed a bit with the transcription timestamps before and this upgrade made a good step good towards better synced results.

Page 2 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.