Silero-vad

Latest version: v0.0.3

Safety actively analyzes 628372 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

4.0

New V4 VAD Released


- Improved [quality](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics#silero-vad-vs-old-silero-vad)
- Improved [perfomance](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics#silero-vad-performance-metrics)
- Both 8k and 16k sampling rates are now supported by the ONNX model
- Batching is now supported by the ONNX model
- Added audio_forward method for one-line processing of a single or multiple audio without postprocessing
- Hotfix applied - wrong model was uploaded
- Minor hotfix re. PyTorch version

3.1

We finally were able to port a model to ONNX:

- Compact model (~100k params);
- Both PyTorch and ONNX models are not quantized;
- Same quality model as the latest best PyTorch release;
- Only 16kHz available now (ONNX has some issues with if-statements and / or tracing vs scripting) with cryptic errors;
- In our tests, on short audios (chunks) ONNX is 2-3x faster than PyTorch (this is mitigated with larger batches or long audios);
- Audio examples and non-core models moved out of the repo to save space;

3.0

Main changes

- One VAD to rule them all! New model includes the functionality of the previous ones with [improved quality](https://github.com/snakers4/silero-vad/wiki/Quality-Metrics) and [speed](https://github.com/snakers4/silero-vad/wiki/Performance-Metrics)!
- Flexible sampling rate, `8000 Hz` and `16000 Hz` are supported;
- Flexible chunk size, minimum chunk size is just 30 milliseconds!
- 100k parameters;
- GPU and batching are supported;
- Radically simplified examples;

Migration

Please see the new [examples](https://github.com/snakers4/silero-vad/wiki/Examples-and-Dependencies#examples).

New `get_speech_timestamps` is a simplified and unified version of the old deprecated `get_speech_ts` or `get_speech_ts_adaptive` methods.


speech_timestamps = get_speech_timestamps(wav, model, sampling_rate=16000)


New `VADIterator` class serves as an example for streaming tasks instead of old deprecated `VADiterator` and `VADiteratorAdaptive`.


vad_iterator = VADIterator(model)
window_size_samples = 1536

for i in range(0, len(wav), window_size_samples):
speech_dict = vad_iterator(wav[i: i+ window_size_samples], return_seconds=True)
if speech_dict:
print(speech_dict, end=' ')
vad_iterator.reset_states()



v2.0-legacy
This is a technical tag, so that users, who do now want to use newer models, could just checkout this tag.

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.