Lhotse

Latest version: v1.31.0

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 8

0.3

New features:
- `CutSet.subset` and `CutSet.filter_supervisions` (145, thanks janvainer)
- An official Collab notebook (156)
- Python 3.6 support (158)
- Support for feature normalization aka CMVN (159, 160)

New corpora:
- National Speech Corpus (Singaporean English) (148)
- IARPA BABEL (25 languages) (157)

Bugfixes:
- populate recording_id for Cut when using Cut.compute_and_store_features (147, thanks freewym)

Other:
- Set default duration limit factor to 1 for K2 Iterable Dataset (148)
- Fix for MixedCut plots (156)

0.2

New features:
- `K2SpeechRecognitionIterableDataset` that supports more efficient batching 116
- Support for `torchaudio.sox_effects` data augmentation alongside `WavAugment` 124

Breaking changes:
- the data augmentation APIs in Lhotse expect `augment_fn` argument instead of `augmenter`, that has a signature like: `def augment_fn(samples: np.ndarray, sampling_rate: int) -> np.ndarray` 124

New corpora:
- Mobvoi Hotwords 109

Enhancements:
- progress bars for corpus downloads and feature extraction 131
- re-using cached LibriSpeech manifests for faster data preparation 133
- `LilcomFilesWriter` and `NumpyFilesWriter` use sub-directories for storage to reduce the filesystem load 134

Several bug fixes and improved testing.

0.1

> ”The journey of a thousand miles begins with one step.” – Lao Tzu

The first official release of Lhotse! It provides a solid base to build speech research and applications upon, by treating speech and audio data as a first-class citizen in the ML world.

Lhotse is going to continue to evolve, and some API changes might still happen.

Highlights:
- audio-specific data model with Recording, Supervision, Features, and Cut manifests
- integration with PyTorch for task-specific Dataset classes and Torchaudio for feature extraction
- built-in data preparation for 8 speech corpora, including Librispeech, Switchboard, AMI, and TED-LIUM v3
- intuitive interfaces that work well with interactive environments such as Jupyter notebooks for data visualisation
- on-the-fly or pre-computed feature extraction and data augmentation

Page 8 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.