Faster-whisper

Latest version: v1.1.0

Safety actively analyzes 681935 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 6

0.7.0

Improve word-level timestamps heuristics

Some recent improvements from openai-whisper are ported to faster-whisper:

* Squash long words at window and sentence boundaries (https://github.com/openai/whisper/commit/255887f219e6b632bc1a6aac1caf28eecfca1bac)
* Improve timestamp heuristics (https://github.com/openai/whisper/commit/f572f2161ba831bae131364c3bffdead7af6d210)

Support download of user converted models from the Hugging Face Hub

The `WhisperModel` constructor now accepts any repository ID as argument, for example:

python
model = WhisperModel("username/whisper-large-v2-ct2")


The utility function `download_model` has been updated similarly.

Other changes

* Accept an iterable of token IDs for the argument `initial_prompt` (useful to include timestamp tokens in the prompt)
* Avoid computing higher temperatures when `no_speech_threshold` is met (same as https://github.com/openai/whisper/commit/e334ff141d5444fbf6904edaaf408e5b0b416fe8)
* Fix truncated output when using a prefix without disabling timestamps
* Update the minimum required CTranslate2 version to 3.17.0 to include the latest fixes

0.6.0

Extend `TranscriptionInfo` with additional properties

* `all_language_probs`: the probability of each language (only set when `language=None`)
* `vad_options`: the VAD options that were used for this transcription

Improve robustness on temporary connection issues to the Hugging Face Hub

When the model is loaded from its name like `WhisperModel("large-v2")`, a request is made to the Hugging Face Hub to check if some files should be downloaded.

It can happen that this request raises an exception: the Hugging Face Hub is down, the internet is temporarily disconnected, etc. These types of exception are now catched and the library will try to directly load the model from the local cache if it exists.

Other changes

* Enable the `onnxruntime` dependency for Python 3.11 as the latest version now provides binary wheels for Python 3.11
* Fix occasional `IndexError` on empty segments when using `word_timestamps=True`
* Export `__version__` at the module level
* Include missing requirement files in the released source distribution

0.5.1

Fix `download_root` to correctly set the cache directory where the models are downloaded.

0.5.0

Improved logging

Some information are now logged under `INFO` and `DEBUG` levels. The logging level can be configured like this:

python
import logging

logging.basicConfig()
logging.getLogger("faster_whisper").setLevel(logging.DEBUG)


More control over model downloads

New arguments were added to the `WhisperModel` constructor to better control how the models are downloaded:

* `download_root` to specify where the model should be downloaded.
* `local_files_only` to avoid downloading the model and directly return the path to the cached model, it it exists.

Other changes

* Improve the default VAD behavior to prevent some words from being assigned to the incorrect speech chunk in the original audio
* Fix incorrect application of option `condition_on_previous_text=False` (note that the bug still exists in openai/whisper v20230314)
* Fix segment timestamps that are sometimes inconsistent with the words timestamps after VAD
* Extend the `Segment` structure with additional properties to match openai/whisper
* Rename `AudioInfo` to `TranscriptionInfo` and add a new property `options` to summarize the transcription options that were used

0.4.1

Fix some `IndexError` exceptions:

* when VAD is enabled and a predicted timestamp is after the last speech chunk
* when word timestamps are enabled and the model predicts a tokens sequence that is decoded to invalid Unicode characters

0.4.0

Integration of Silero VAD

The [Silero VAD](https://github.com/snakers4/silero-vad) model is integrated to ignore parts of the audio without speech:

python
model.transcribe(..., vad_filter=True)


The default behavior is conservative and only removes silence longer than 2 seconds. See the README to find how to customize the VAD parameters.

**Note:** the Silero model is executed with `onnxruntime` which is currently not released for Python 3.11. The dependency is excluded for this Python version and so the VAD features cannot be used.

Speaker diarization using stereo channels

The function `decode_audio` has a new argument `split_stereo` to split stereo audio into seperate left and right channels:

python
left, right = decode_audio(audio_file, split_stereo=True)

model.transcribe(left)
model.transcribe(right)


Other changes

* Add `Segment` attributes `avg_log_prob` and `no_speech_prob` (same definition as openai/whisper)
* Ignore audio frames raising an `av.error.InvalidDataError` exception during decoding
* Fix option `prefix` to be passed only to the first 30-second window
* Extend `suppress_tokens` with some special tokens that should always be suppressed (unless `suppress_tokens is None`)
* Raise a more helpful error message when the selected model size is invalid
* Disable the progress bar when the model to download is already in the cache

Page 5 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.