Lhotse

Latest version: v1.31.0

Safety actively analyzes 723954 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 8

1.8

Breaking changes

- Python 3.6 is no longer supported as of Lhotse v1.8. If you need to use Python 3.6, please revert to Lhotse 1.7 and earlier.

Highlights

- New experimental module of lhotse: `workflows`, now integrates optional third party packages that assist corpus creators in automated data curation. With release 1.8, we support OpenAI Whisper for automatic transcription and segmentation, and torchaudio Wav2Vec2/Hubert ASR bundles for forced alignment.

![ctxG6RI](https://user-images.githubusercontent.com/15930688/193278018-85bc7f82-e879-44de-9123-b97e826d3f4f.png)


What's Changed
* Fix read and write in piped CLI by desh2608 in https://github.com/lhotse-speech/lhotse/pull/807
* Default behavior of CutSet.mix by ZuoyunZheng in https://github.com/lhotse-speech/lhotse/pull/809
* Adding more info about resampling options by RuABraun in https://github.com/lhotse-speech/lhotse/pull/815
* Add `pad_silence` option to `extend_by` by desh2608 in https://github.com/lhotse-speech/lhotse/pull/816
* Message when calling len() on LazyFilter by desh2608 in https://github.com/lhotse-speech/lhotse/pull/817
* Refactor cut and retain `git blame` history by desh2608 in https://github.com/lhotse-speech/lhotse/pull/820
* Audio backend refactoring and a workaround for FLAC reading from/writing to in-memory buffers by pzelasko in https://github.com/lhotse-speech/lhotse/pull/814
* Experimental Lhotse feature: corpus creation tools (``workflows``), starting with OpenAI Whisper support by pzelasko in https://github.com/lhotse-speech/lhotse/pull/824
* Drop support for Python 3.6 by pzelasko in https://github.com/lhotse-speech/lhotse/pull/829
* [workflow] Word-level forced alignment with pretrained models from Torchaudio by pzelasko in https://github.com/lhotse-speech/lhotse/pull/827

New Contributors
* ZuoyunZheng made their first contribution in https://github.com/lhotse-speech/lhotse/pull/809

**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/v1.7...v1.8

1.7

What's Changed
* add test data to bvcc by oplatek in https://github.com/lhotse-speech/lhotse/pull/797
* Add reverb with fast RIR generator by desh2608 in https://github.com/lhotse-speech/lhotse/pull/799
* Support `snip_edges=True` in `online_inference` of Kaldi feature extractors by pzelasko in https://github.com/lhotse-speech/lhotse/pull/802
* Remove warning about Lhotse not being stable from README.md by pzelasko in https://github.com/lhotse-speech/lhotse/pull/804
* Update the documentation related to optional packages by pzelasko in https://github.com/lhotse-speech/lhotse/pull/805


**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/v1.6...v1.7

1.6

What's Changed
* Feature/fix 754 voxceleb download by mikuchar in https://github.com/lhotse-speech/lhotse/pull/776
* Support Kaldi data dierectories without segments file. by MartinKocour in https://github.com/lhotse-speech/lhotse/pull/789
* Add normalization for text of mulit_cn recipe:thchs_30, tal_csasr, tal_asr, aishell, aishell2,etc by shanguanma in https://github.com/lhotse-speech/lhotse/pull/760
* Improve support for custom Recordings by pzelasko in https://github.com/lhotse-speech/lhotse/pull/791
* Add `Cut.has(field)` method to query Cuts for custom attributes by pzelasko in https://github.com/lhotse-speech/lhotse/pull/792
* Add normalization for aishell2 recipe by shanguanma in https://github.com/lhotse-speech/lhotse/pull/790

New Contributors
* mikuchar made their first contribution in https://github.com/lhotse-speech/lhotse/pull/776
* MartinKocour made their first contribution in https://github.com/lhotse-speech/lhotse/pull/789

**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/1.5...v1.6

1.5

What's Changed

* Describe more information about cuts by pzelasko in https://github.com/lhotse-speech/lhotse/pull/772
* Change vctk.py to adapt the vctk dataset downloaded from edinburgh url by luomingshuang in https://github.com/lhotse-speech/lhotse/pull/775
* Fix restoring sampler state with `world_size>1` by pzelasko in https://github.com/lhotse-speech/lhotse/pull/773
* Revert 738 to use aidatatang as the prefix for aidatatang_200zh. by csukuangfj in https://github.com/lhotse-speech/lhotse/pull/782
* use tolerance when checking duration mismatch by shaynemei in https://github.com/lhotse-speech/lhotse/pull/781

New Contributors

* shaynemei made their first contribution in https://github.com/lhotse-speech/lhotse/pull/781

**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/v1.4...v1.5

1.4

What's Changed
* Fix lambda warnings from lazy manifests + leverage `dill` if installed for pickling lambdas by pzelasko in https://github.com/lhotse-speech/lhotse/pull/748
* `multi_cn` recipes: `aishell2`, `magicdata`, `primewords`, `stcmds`, `tal_asr`, `tal_csasr`, `thchs_30` by shanguanma in https://github.com/lhotse-speech/lhotse/pull/738
* Deprecate `strict`, `proportional_sampling`, and `bucket_method` arguments by pzelasko in https://github.com/lhotse-speech/lhotse/pull/756
* Fix `lhotse cut simple` CLI by pzelasko in https://github.com/lhotse-speech/lhotse/pull/759
* Fix issues with eager CutSet creation from lazy manifests by pzelasko in https://github.com/lhotse-speech/lhotse/pull/763
* DailyTalk recipe by pzelasko in https://github.com/lhotse-speech/lhotse/pull/767
* add aishell2 dev test by yuekaizhang in https://github.com/lhotse-speech/lhotse/pull/766
* Enable GlobalMVN computation with on-the-fly feature extraction by pzelasko in https://github.com/lhotse-speech/lhotse/pull/769
* Add support for Python 3.10 and PyTorch 1.12 by pzelasko in https://github.com/lhotse-speech/lhotse/pull/764

New Contributors
* yuekaizhang made their first contribution in https://github.com/lhotse-speech/lhotse/pull/766

**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/v1.3...1.4

1.3

What's Changed
* Fix plotting MixedCut audio tracks by pzelasko in https://github.com/lhotse-speech/lhotse/pull/723
* [continued] Fixes Bucketing sampler equal duration method that drops cuts by m-wiesner in https://github.com/lhotse-speech/lhotse/pull/724
* feature extraction will read RecordingSet from a file, not just json. by RuABraun in https://github.com/lhotse-speech/lhotse/pull/728
* Use `lilcom_chunky` as default in CLI by pzelasko in https://github.com/lhotse-speech/lhotse/pull/729
* Set CLI torch number of threads to 1 by pzelasko in https://github.com/lhotse-speech/lhotse/pull/732
* Update wenet_speech.py by fanlu in https://github.com/lhotse-speech/lhotse/pull/731
* Fix heroico regex strings by jtrmal in https://github.com/lhotse-speech/lhotse/pull/734
* Update mgb2.py by AmirHussein96 in https://github.com/lhotse-speech/lhotse/pull/725
* Remove file handle caching from LilcomChunkyReader by pzelasko in https://github.com/lhotse-speech/lhotse/pull/737
* Make `h5py` an optional dependency by pzelasko in https://github.com/lhotse-speech/lhotse/pull/741
* Assert `CutSet.mix()` argument `cuts` is not a lazy manifest by pzelasko in https://github.com/lhotse-speech/lhotse/pull/742
* `CutSet`: more methods are lazy + two simplified common use-cases `attach_tensor` and `load_audio` by pzelasko in https://github.com/lhotse-speech/lhotse/pull/744
* Collections: support reading from/writing to "-" (including webdataset) by pzelasko in https://github.com/lhotse-speech/lhotse/pull/745
* fix CommonVoice prepare by mohsen-goodarzi in https://github.com/lhotse-speech/lhotse/pull/743

New Contributors
* RuABraun made their first contribution in https://github.com/lhotse-speech/lhotse/pull/728
* mohsen-goodarzi made their first contribution in https://github.com/lhotse-speech/lhotse/pull/743

**Full Changelog**: https://github.com/lhotse-speech/lhotse/compare/v1.2...v1.3

Page 5 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.