Lhotse

Latest version: v1.29.0

Safety actively analyzes 688705 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 8

0.2

New features:
- `K2SpeechRecognitionIterableDataset` that supports more efficient batching 116
- Support for `torchaudio.sox_effects` data augmentation alongside `WavAugment` 124

Breaking changes:
- the data augmentation APIs in Lhotse expect `augment_fn` argument instead of `augmenter`, that has a signature like: `def augment_fn(samples: np.ndarray, sampling_rate: int) -> np.ndarray` 124

New corpora:
- Mobvoi Hotwords 109

Enhancements:
- progress bars for corpus downloads and feature extraction 131
- re-using cached LibriSpeech manifests for faster data preparation 133
- `LilcomFilesWriter` and `NumpyFilesWriter` use sub-directories for storage to reduce the filesystem load 134

Several bug fixes and improved testing.

0.1

> ”The journey of a thousand miles begins with one step.” – Lao Tzu

The first official release of Lhotse! It provides a solid base to build speech research and applications upon, by treating speech and audio data as a first-class citizen in the ML world.

Lhotse is going to continue to evolve, and some API changes might still happen.

Highlights:
- audio-specific data model with Recording, Supervision, Features, and Cut manifests
- integration with PyTorch for task-specific Dataset classes and Torchaudio for feature extraction
- built-in data preparation for 8 speech corpora, including Librispeech, Switchboard, AMI, and TED-LIUM v3
- intuitive interfaces that work well with interactive environments such as Jupyter notebooks for data visualisation
- on-the-fly or pre-computed feature extraction and data augmentation

Page 8 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.