Lhotse

Latest version: v1.29.0

Safety actively analyzes 688705 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 8

0.8

Breaking changes
- Lhotse `CutSampler` classes now return mini-batch `CutSet`s instead of a list of string cut IDs (Lhotse Dataset classes are adjusted correspondingly) (345)
- Cut refactoring (`Cut` is now an abstract base class for all cut types; what was previously called `Cut` is now called `MonoCut`) (328)
- CLI: `lhotse obtain` is now `lhotse download` (329)

Corpora
- TIMIT (324 thanks luomingshuang)
- Fisher English (374 thanks videodanchik)
- Fisher Spanish (376 thanks videodanchik)
- yesno (380 thanks csukuangfj)
- improvements to GigaSpeech recipe (329 334 337 381 thanks jimbozhang)
- including word alignments in LibriSpeech recipe (379)

New features

CutSampler improvements (PyTorch data API)

- `ZipSampler` for batches constructed from different cut sources (344 347 363 thanks for fixes janvainer)
- `drop_last` option and `get_report()` method for cut samplers (357)
- `find_pessimistic_batches` utility to help fail fast with GPU OOM (358)
- streaming variant of shuffling for lazy CutSets in samplers (359)
- a bucketing method with equal cumulative bucket duration for `BucketingSampler` (365)
- approximate proportional sampling in `BucketingSampler` (372)

I/O improvements

- chunked OPUS file reads (339)
- chunked sphere file reads (367 thanks videodanchik)
- faster `OnTheFlyFeatures` (padding audio instead of features) (352)
- `ChunkedLilcomHdf5Writer` (and reader) for efficient chunk reads of lilcom-compressed arrays (334)
- a global cache for re-using smart_open connection sessions (improves performance for repeated smart_open calls e.g., to S3) (335, thanks oplatek)

Data augmentation

- tempo perturbation (375 thanks janvainer)
- volume perturbation (382 thanks videodanchik)

Others
- `CutSet.trim_to_supervision` has new arguments for including actual acoustic context next to the supervisions (330 331)
- `SupervisionSegment` is now mutable (and all Lhotse manifests will remain mutable) (333)
- `.shuffle()` method for Lhotse `*Set` classes (341)
- `lhotse fix` CLI (360)
- `lhotse install-sph2pipe` for handling LDC corpora compressed with `shorten` (auto-registers sph2pipe so no further actions are needed) (370)


General improvements
- refreshed docs (327 328 330)
- improvements to downloading corpora (340)
- experimental dataloader that allows two levels of parallelism (343, might be abandoned for other alternatives)
- auto-detection of compatible torchaudio version for pytorch (348)
- improvements to Kaldi data dir import/export (351 354)
- fixed cut ordering in `CutSet.subset(cut_ids=...)` (353)
- improvements to storing cuts as recordings (355)
- refactored `lhotse.dataset.sampling` file into a directory module (366)
- improvements to CLI (369 371 thanks songmeixu)
- improvements to setup (377 383 thanks songmeixu)
- Colab notebook with ESPnet + Lhotse example (384)
- improvements to Lhotse versioning (385)

0.7

New corpora

- GigaSpeech (283, thanks jimbozhang)
- Dihard 3 (287, thanks desh2608)
- GALE Arabic and Mandarin (296, thanks desh2608)
- CMU and CSLU Kids (297, thanks desh2608)
- MTedX (301, thanks m-wiesner)
- LibriTTS (306)

New features

- Reading huge manifests lazily with Apache Arrow (documentation and examples are coming) (286, 288, 289, 290, 292, 294)
- Sequential JSONL writer storing manifests on disk as they are created (302)
- Support for alignments in `SupervisionSegment` (304, 310, 313, thanks desh2608)
- PyTorch Kaldi-compatible feature extractors that support GPU, batching and autograd (307, thanks jesus-villalba)
- Reading, writing, and uploading features to URLs (e.g. S3 or GCP) (312)
- Store waveforms of cuts as audio recordings to disk (316, thanks entn-at)
- Support for importing Kaldi's feats.scp and reading features directly from scp/ark (318)

General improvements

- add multi thread to process AIShell data (259, thanks pingfengluo)
- tracking dev versions (291, thanks oplatek)
- Explicitly set UTF-8 encoding when reading README.md in setup.py (293, thanks entn-at)
- Auto-add link to source code in docs (295)
- `cut.resample()` (299)
- fixing flaky tests (300)
- fix AMI CLI mode (303, thanks desh2608)
- handle zero energy error in audio mixing (305)
- update Kaldi related docs (308)
- add a missing SpecAugment parameter (309)
- fixing edge cases for audio transforms (311)
- Add `drop_last` option in `*Set.split()` (315)
- Support h5py file modes in feature writers (317)
- don't using kaldi reco2dur and fix some error in bin/lhotse (318, thanks shanguanma)
- Fix cut num of samples bug (322, thanks dophist)
- use whitespace in kaldi field-splitting (323, thanks dophist)

0.6

New corpora

- CMU Arctic (225)
- L2 Arctic (227, 251)
- VCTK (228, 253, 254)
- CallHome English (278)
- CallHome Egyptian (208)
- Multilingual Librispeech (282 -- can be quite slow but we plan on improving the speed in further releases)

Features

PyTorch Dataset API
- Lhotse's samplers are now fully deterministic, have `len()` that returns the number of batches, and return a consistent number of batches in all distributed workers. (213, 222, 223, 224, 255, 267, thanks janvainer)
- On-the-fly feature extraction in PyTorch datasets (229)
- visualisations of ASR batches with multiple transforms applied (234)

Features and transforms
- Add LIbrosaFbank consistent with various TTS applications (252, thanks janvainer)
- SpecAugment (246)
- option to pad cuts from left/right/both directions (216)
- Randomized smoothing augmentation (272, 273, 274)
- Randomized extra padding (281)

I/O and serialization
- [experimental] Downloading audio from HTTP/S3/GCP/Azure URLs upon request (233)
- Use HDF5 as the default storage backend for features (237)
- Add JSONL support (262)
- Support for auto-magically determined serializers (CLI + Python API) (264)

Removed features
- Removed WavAugment support (use torchaudio.sox_effects instead) (232)

General improvements
- Add tolerance to validate_recordings_and_supervisions (208, thanks janvainer )
- Fix incorrect truncation in cut mixing for data augmentation (214)
- Return lengths from feature and token collations (211, thanks janvainer)
- Refactor Standardize to GlobalMVN (230, thanks janvainer)
- Fix rare error in randomized Recording's resampling test (239)
- Fix concatenate cuts omitting the longest cut when duration_factor > 1 (240)
- Fix CutMix not adding enough noise in long cuts (241)
- Add max_cuts keyword to global stats computation in GlobalMVN (245)
- Improved error message for mixing audio (248)
- Add a check for matching sampling rates when mixing cuts (247)
- Fix - make VCTK CLI discoverable (250)
- Fix trim_to_supervisions and CLI (249)
- Fix find segments float rounding issues (265)
- Update the examples of libirispeech (full) and ami (268, thanks jimbozhang)
- Fix test for sphere files (269, thanks csukuangfj)
- More informative error message for incorrect channels in load_audio() (270)

0.5

New features:

Major overhaul of support for PyTorch Dataset API (194 197 202)

Lhotse now implements a number of PyTorch datasets and samplers. The core features are:
- familiar API (map-style datasets and cut samplers that work with standard `DataLoader`)
- dynamic batch size, chosen based on constraints such as `max_frames`
- bucketing or cut concatenation as strategies for avoiding too much padding
- optional noise padding (using `CutMix` transform)
- our samplers work with DDP training out-of-the-box (no need for `DistributedSampler`)
- More details available at: https://lhotse.readthedocs.io/en/latest/datasets.html

Example code:
python
from torch.utils.data import DataLoader
from lhotse.dataset import SpeechRecognitionDataset, SingleCutSampler

cuts = CutSet(...)
dset = SpeechRecognitionDataset(cuts)
sampler = SingleCutSampler(cuts, max_frames=50000)
Dataset performs batching by itself, so we have to indicate that
to the DataLoader with batch_size=None
dloader = DataLoader(dset, sampler=sampler, batch_size=None, num_workers=1)
for batch in dloader:
... process data


Lazy (on-the-fly) resampling on Recording/RecordingSet (185)

The resampling is performed at the moment of reading the audio samples from disk. It automatically adjusts the duration/num_samples in the data manifest.

python
recording = recording.resample(22050)
recording_set = recording_set.resample(8000)


New corpora:

- **AMI recipe extension to all microphone settings and official scenarios (154 - kudos to desh2608)**

General improvements:

- `CutSet.subset()` got `first` and `last` arguments (like Kaldi's `subset_data_dir.sh`) and a CLI mode (188)
- `CutSet.from_manifest()` creates deterministic Cut IDs by default (186)
- Padding cuts with arbitrary user specified values (now also works with custom feature extractors) (187)
- Improved code coverage measurements (now excludes test code and recipe code) (191 192)
- Improved support for sampling rates other than 8k and 16k (190 195)
- Documentation build fixes (196)
- Fixes in NSC recipe (199)
- Fixes in ASR dataset validation (204)

0.4

New features:

- Lazy time-domain speed perturbation of Recording/Cut that also adjusts supervision segments (167)

python
cuts_sp = cuts.perturb_speed(0.9)


- Manifest validation (`lhotse.validate()`) (175)

python
lhotse.validate(cuts)


- Parallel feature extraction API lifting (176)

python
As simple as:
cuts = cuts.compute_and_store_features(lhotse.Fbank(), 'path/to/feats', num_jobs=20)


- Support for using HDF5 storage with parallel feature extraction (176)

python
Modify the above with:
cuts = cuts.compute_and_store_features(lhotse.Fbank(), 'path/to/feats', num_jobs=20, storage_type=lhotse.LilcomHdf5Storage)

- `CutSet` mixing for noise data augmentation (180)

python
Can be performed after feature extraction for dynamic feature-domain mixing!
cuts = cuts.mix(noise_cuts, snr=[10, 30], mix_prob=0.5)


- On-the-fly noise data augmentation for K2 ASR (180)

New corpora:

- Aishell (170, thanks fanlu)
- Musan (174)

General improvements:
- LibriSpeech recipe API lifting and major preparation speedup (163)
- Stop using deprecated torchaudio.info (164)
- CutSet `map()` and `modify_ids()` methods (165)
- Parallelism: Executor concept documentation (152)
- Single/multi channel audio/features collation methods for a batch of Cuts (173)
- Cache data manifests for Mobvoi (168, thanks freewym)
- High-level workflow illustrations in docs (178)

0.3

New features:
- `CutSet.subset` and `CutSet.filter_supervisions` (145, thanks janvainer)
- An official Collab notebook (156)
- Python 3.6 support (158)
- Support for feature normalization aka CMVN (159, 160)

New corpora:
- National Speech Corpus (Singaporean English) (148)
- IARPA BABEL (25 languages) (157)

Bugfixes:
- populate recording_id for Cut when using Cut.compute_and_store_features (147, thanks freewym)

Other:
- Set default duration limit factor to 1 for K2 Iterable Dataset (148)
- Fix for MixedCut plots (156)

Page 7 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.