Torchaudio

Latest version: v2.6.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 14 of 16

0.9

Unit: msec

Examples
* Add text preprocessing utilities for TTS pipeline (1639)

* Replace simple_ctc with Python greedy decoder (1558)

* Add an inference example for WaveRNN (1637)

* Refactor coding style for WaveRNN example (1663)

* Add style checks on example files on CI (1667)

* Add Tacotron2 training script (1642)

* Add an inference example for Tacotron2 (1654)

* Fix Tacotron2 inference example (1716)

* Fix WaveRNN training example (1740)

* Training recipe for ConvTasNet on Libri2Mix dataset (1757)

Build
* Update skipIfNoCuda decorator and force GPU tests in GPU CIs (1559)

* Temporarily pin nightly version on Linux/macOS CPU unittest (1598)

* Temporarily pin nightly version on Linux GPU unitest (1606)

* Revert CI hot fix (1614)

* Expose USE_CUDA in build (1609)

* Pin MKL to 2021.2.0 (1655)

* Simplify extension initialization (1649)

* Synchronize extension initialization mechanism with fbcode (1682)

* Ensure we’re propagating BUILD_VERSION (1697)

* Guard Kaldi’s version generation (1715)

* Update sphinx to 3.5.4 (1685)

* Default to BUILD_SOX=1 in non-Windows systems (1725)

* Add CUDA install step to Win Packaging jobs (1732)

* setup.py should parse TORCH_CUDA_ARCH_LIST (1733)

* Simplify the extension initialization process (1734)

* Fix CUDA build logic for _torchaudio.so (1737)

* Enable Linux wheel/conda GPU package builds (1730)

* Increase no_output_timeout to 20m for WinConda (1738)

* Build torchaudio for 11.3 as well (1747)

* Upload wheels to respective folders (1751)

* Extract PyBind11 feature implementations (1739)

* Update the way to access libsox global config (1755)

* Fix ROCM build error (1729)

* Fix compile warnings (1762)

* Migrate CircleCI docker image (1767)

* Split extension into custom impl and Python wrapper libraries (1752)

* Put libtorchaudio in lib directory (1773)

* Update win gpu image from previous to stable (1786)

* Set libtorch audio suffix as pyd on Windows (1788)

* Fix build on Windows with CUDA (1787)

* Enable audio windows cuda tests (1777)

* Set release and base PyTorch version (1816)

* Exclude prototype if it is in release (1870)

* Log prototype exclusion (1882)

* Update prototype exclusion (1885)

* Remove alpha from version number (1901)

Testing
* Migrate resample tests from kaldi to functional (1520)

* Add autograd gradcheck test for RNN transducer loss (1532)

* Fix HF wav2vec2 test (1585)

* Update unit test CUDA to 10.2 (1605)

* Fix CircleCI unittest environemnt

* Remove skipIfRocm from test_fileobj_flac in soundfile.save_test (1626)

* MFCC test refactor (1618)

* Refactor RNNT Loss Unit Tests (1630)

* Reduce sample rate to avoid test time out (1640)

* Refactor text preprocessing tests in Tacotron2 example (1635)

* Move test initialization logic to dedicated directory (1680)

* Update pitch shift batch consistency test (1700)

* Refactor scripting in test (1727)

* Update the version of fairseq used for testing (1745)

* Put output tensor on proper device in get_whitenoise (1744)

* Refactor batch consistency test in transforms (1772)

* Tweak test name by appending factory function name (1780)

* Enable audio windows cuda tests (1777)

* Skip hubert_asr_xlarge TS test on Windows (1800)

* Skip hubert_xlarge TS test on Windows (1807)

Others
* Remove unused files (1588)

* Remove residuals for removed modules (1599)

* Remove torchscript bc test references (1623)

* Remove torchaudio._internal.fft module (1631)

Misc
* Rename master branch to main (1649)

* Fix Python spacing (1670)

* Lint fix (1726)

* Add .gitattributes (1731)

* Style fixes (1766)

* Update reference from master to main elsewhere (1784)

Bug Fixes
* Fix models import (1664)

* Fix HF model integration (1781)

Documentation
* README Updates

* Update README (1544)

* Remove NumPy dependency from README (1582)

* Fix typos and sentence structure in README.md (1633)

* Update and move convention section to CONTRIBUTING.md (1635)

* Remove unnecessary README (1728)

* Add link to TTS colab example to README (1748)

* Fix typo in source separation README (1774)

* Docstring Changes

* Set removal version of pseudo complex support (1553)

* Update docs (1584)

* Add return type in doc for RNNT loss (1591)

* Improve RNNT loss docstrings (1642)

* Add documentation for CMUDict’s property (1683)

* Refactor lfilter docs (1698)

* Standardize optional types in docstrings (1746)

* Fix return type of wav2vec2 model (1790)

* Add equations to MVDR docstring (1789)

* Standardize tensor shapes format in docs (1838)

* Add license to pre-trained model doc (1836)

* Update Tacotron2 docs (1840)

* Fix PitchShift docstring (1866)

* Update descriptions of lengths parameters (1890)

* Standardization and minor fixes (1892)

* Update models/pipelines doc (1894)

* Docs formatting

* Remove override CSS (1554)

* Add prototype.tacotron2 page to docs (1695)

* Add doc for InverseSepctrogram (1706)

* Add sections to transforms docs (1720)

* Add edit_distance to documentation with a new category Metric (1743)

* Fix model subsections (1775)

* List all the pre-trained models on right bar (1828)

* Put pretrained weights to subsection (1879)

* Examples (see 1564)

* Add example code for Resample (1644)

* Fix examples in transforms (1646)

* Add example for ComplexNorm (1658)

* Add example for MuLawEncoding (1586)

* Add example for Spectrogram (1566)

* Add example for GriffinLim (1671)

* Add example for MuLawDecoding (1684)

* Add example for Fade transform (1719)

* Update RNNT loss docs and add example (1835)

* Add SpecAugment figure/citation (1887)

* Add filter bank figures (1891)

0.9.0

- Lots of performance improvements. (filtering, resampling, spectral operation)
- Popular wav2vec2.0 model architecture.
- Improved autograd support.

[Beta] Wav2Vec2.0 Model

This release includes model architectures from [wav2vec2.0](https://arxiv.org/abs/2006.11477) paper with utility functions that allow importing pretrained model parameters published on <code>[fairseq](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec)</code> and [Hugging Face Hub](https://huggingface.co/models?filter=wav2vec2). Now you can easily run speech recognition with torchaudio. These model architectures also support TorchScript, and you can deploy them with ONNX or in non-Python environments, such as C++, Android and iOS. Please checkout our [C++](https://github.com/pytorch/audio/tree/master/examples/libtorchaudio), [Android](https://github.com/pytorch/android-demo-app/tree/master/SpeechRecognition) and [iOS](https://github.com/pytorch/ios-demo-app/tree/master/SpeechRecognition) examples. The following snippets illustrate how to create a deployable model.

python
Import fine-tuned model from Hugging Face Hub
import transformers
from torchaudio.models.wav2vec2.utils import import_huggingface_model

original = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
imported = import_huggingface_model(original)

python
Import fine-tuned model from fairseq
import fairseq
from torchaudio.models.wav2vec2.utils import import_fairseq_model

Original, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task(
["wav2vec_small_960h.pt"], arg_overrides={'data': "<data_dir>"})
imported = import_fairseq_model(original[0].w2v_encoder)

python
Build uninitialized model and load state dict
from torchaudio.models import wav2vec2_base

model = wav2vec2_base(num_out=32)
model.load_state_dict(imported.state_dict())

Quantize / script / optimize for mobile
quantized_model = torch.quantization.quantize_dynamic(
model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
scripted_model = torch.jit.script(quantized_model)
optimized_model = optimize_for_mobile(scripted_model)
optimized_model.save("model_for_deployment.pt")

Filtering Improvement

The internal implementation of `lfilter` has been updated to support autograd on both CPU and CUDA. Additionally, the performance on CPU is significantly improved. These improvements also apply to `biquad` variants.

The following table illustrates the performance improvements compared against the previous releases. `lfilter` was applied on `float32` tensors with one channel and different number of frames.

<table>
<tr>
<td>torchaudio version
</td>
<td>256
</td>
<td>512
</td>
<td>1024
</td>
</tr>
<tr>
<td>0.9
</td>
<td>

0.8.1

Highlights

This release depends on pytorch 1.8.1.

Bug Fixes

* Added back support for 24-bit signed LPCM wav via sox_io backend. (1389)

0.8.0

Highlights

This release supports Python 3.9.

I/O Improvements

Continuing from the previous release, torchaudio improves the audio I/O mechanism. In this release, we have four major updates.

1. Backend migration.
We have migrated the default backend for audio I/O. The new default backend is “sox_io” (for Linux/macOS). The interface for “soundfile” backend has been also changed to align that of “sox_io”. Following the change of default backends, the legacy backend/interface have been marked as deprecated. The legacy backend/interface are still accessible, though it is strongly discouraged to use them. For the detail on the migration, please refer to 903.

1. File-like object support.
We have added file-like object support to I/O functions and sox_effects. You can perform the `info`, `load`, `save` and `apply_effects_file` operation on file-like objects.
python
Query audio metadata over HTTP
Will only fetch the first few kB
with requests.get(URL, stream=True) as response:
metadata = torchaudio.info(response.raw)

Load audio from tar file
No need to extract TAR file.
with tarfile.open(TAR_PATH, mode='r') as tarfile_:
fileobj = tarfile_.extractfile(SAMPLE_TAR_ITEM)
waveform, sample_rate = torchaudio.load(fileobj)

Saving to Bytes buffer
Using BytesIO, you can perform in-memory encoding/decoding.
buffer_ = io.BytesIO()
torchaudio.save(buffer_, waveform, sample_rate, format="wav")

Apply effects (lowpass filter / resampling) while loading audio from S3
client = boto3.client('s3')
response = client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
waveform, sample_rate = torchaudio.sox_effects.apply_effect_file(
response['Body'], [["lowpass", "-1", "300"], ["rate", "8000"]])

1. [Beta] Codec Application.
Built upon the file-like object support, we added `functional.apply_codec` function, which can degrades audio data by applying audio codecs supported by “sox_io” backend, in in-memory fashion.
python
Apply MP3 codec
degraded = F.apply_codec(
waveform, sample_rate, format="mp3", compression=-9)
Apply GSM codec
degraded = F.apply_codec(waveform, sample_rate, format="gsm")

1. Encoding options.
We have added encoding options to save function of new backends. Now you can change the format and encodings with `format`, `encoding` and `bits_per_sample` options
python
Save without any encoding option.
The function will pick the encoding which the provided data fit
For Tensor of float32 type, that is 32-bit floating-point PCM.
torchaudio.save("data.wav", waveform, sample_rate)

Save as 16-bit signed integer Linear PCM
The resulting file occupies half the storage but loses precision
torchaudio.save(
"data.wav", waveform, sample_rate, encoding="PCM_S", bits_per_sample=16)

1. More format support to "sox_io"’s save function.
We have added support for GSM, HTK, AMB, and AMR-NB formats to "sox_io"’s save function.

Switch to CMake-based build

torchaudio was utilizing CMake to build third party dependencies. Now torchaudio uses CMake to build its C++ extension. This will open the door to integrate torchaudio in non-Python environments (such as C++ applications and mobile). We will work on adding example applications and mobile integrations in upcoming releases.

Backwards Incompatible Changes

* Removed deprecated transform and target_transform arguments from VCTK and YESNO datasets. (1120) If you were relying on the previous behavior, we recommend that you apply the transforms in the collate function.
* Removed torchaudio.datasets.utils.walk_files (1111) and replaced by Path and glob. (1069, 1101). If you relied on the function, we recommend that you use glob instead.
* Removed torchaudio.data.utils.unicode_csv_reader. (1086) If you relied on the function, we recommend that you replace by csv.reader.
* Disabled CommonVoice download as users are required to sign user agreement. Please download and extract the dataset manually, and replace the root argument by the subfolder for the version and language of interest, see 1082 for more details. (1018, 1079, 1080, 1082)
* Removed legacy sox effects (977, 1001). Please migrate to apply_effects_file or apply_effects_tensor.
* Switched the default backend to the ones with new interfaces (978). If you were relying on the previous behavior, you can return to the previous behavior by following instructions in 975 for one more release.

New Features

* Added GSM, HTK, AMB, AMR-NB and AMR-WB format support to “sox_io” backend. (1276, 1291, 1277, 1275, 1066)
* Added encoding options (format, bits_per_sample and encoding) to save function. (1226, 1177, 1129, 1104)
* Added new attributes (bits_per_sample and encoding) to the info function return type (AudioMetaData) (1177, 1206, 1324)
* Added format override to libsox-based file input. (load, info, sox_effects.apply_effects_file) (1104)
* Added file-like object support in “sox_io”, and “soundfile” backend and sox_effects.apply_effects_file. (1115)
* [Beta] Added the Kaldi Pitch feature. (1243, 1260)
* [Beta] Added the SpectralCentroid transform. (1167, 1216, 1316)
* [Beta] Added codec transformation apply_codec. (1200)

Improvements

* Exposed normalization method to Mel transforms. (1212)
* Exposed additional STFT arguments to Spectrogram (892) and to MelSpectrogram (1211).
* Added support for pathlib.Path to apply_effects_file (1048) and to CMUARCTIC (1025), YESNO (1015), COMMONVOICE (1027), VCTK and LJSPEECH (1028), GTZAN (1032), SPEECHCOMMANDS (1039), TEDLIUM (1045), LIBRITTS and LIBRISPEECH (1046).
* Added SpeechCommands train/valid/test split. (966, 1012)

Internals

* Replaced if-elseif-else with switch in sox C++ code. (1270)
* Refactored C++ interface for sox_io's get_info_file (1232) and get_encodinginfo (1233).
* Add explicit functional import in init. (1228)
* Refactored YESNO dataset (1127), LJSPEECH dataset (1143).
* Removed Python 2.7 reference from setup.py. (1182)
* Merged flake8 configurations into single .flake8 file. (1172, 1214)
* Updated calls to torch.stft to use return_complex=True. (1096, 1013)
* Cleaned up handling of optional args in C++ with c10:optional. (1043)
* Removed unused imports in sox effects. (1052)
* Introduced functional submodule to organize functionals. (1003)
* [Testing] Refactored MelSpectrogram librosa compatibility test to decouple from other tests. (1267)
* [Testing] Moved batch tests for functionals. (1254)
* [Testing] Refactored tests for backend (1239) and for functionals (1237).
* [Testing] Removed dependency on pytest from testing (1157, 1188)
* [Testing] Refactored unitests for VCTK (1134), SPEECHCOMMANDS (1136), LIBRISPEECH (1140), TEDLIUM (1135), LJSPEECH (1138), LIBRITTS (1139), CMUARCTIC (1147), GTZAN(1148), COMMONVOICE and YESNO (1133).
* [Testing] Removed dependency on COMMONVOICE dataset from tests. (1132)
* [Build] Fixed Python 3.9 support (1242)
* [Build] Switched to cmake for build. (1187, 1246, 1249)
* [Build] Restructured C++ code to allow per file registration of custom ops. (1221)
* [Build] Added logging to sox/CMakeLists.txt. (1190)
* [Build] Disabled C++11 ABI when necessary for libtorch compatibility. (880)
* [Build] Reorganized libsox source and build directory to accommodate additional third party code. (1161, 1176)
* [Build] Refactored sox source files and moved into dedicated subfolder. (1106)
* [Build] Enabled custom clean function for python setup.py clean. (1142)
* [CI] Documented undocumented parameters. Added CI check. (1248)
* [CI] Fixed sphinx warnings in documentation. Turned warnings into errors. (1247)
* [CI] Print CPU info before running unit test. (1218)
* [CI] Fixed clang-format job and fixed newly detected formatting issues. (981, 1198, 1222)
* [CI] Updated unit test base Docker image. (1193)
* [CI] Disabled CCI cache which is now known to be flaky. (1189)
* [CI] Disabled torchscript BC test which is known to fail. (1192)
* [CI] Stripped version suffix for pytorch. (1185)
* [CI] Ran smoke test with CPU package for pytorch due to known issue with CUDA 11. (1105)
* [CI] Added missing empty line at the end of config.yml. (1020)
* [CI] Added automatic documentation build and push to branch in CI. (1006, 1034, 1041, 1049, 1091, 1093, 1098, 1100, 1121)
* [CI] Ran GPU test for all pull requests and fixed current setup. (998, 1014, 1191)
* [CI] Skipped tests that is known to fail on macOS Python 3.6/3.7. (999)
* [CI] Changed the order of installation and aligned with Windows. (987)
* [CI] Fixed documentation rendering by using Sphinx 2.4.4. (974)
* [Doc] Added subcategories to functional documentation. (1325)
* [Doc] Added a version selector in documentation. (1273)
* [Doc] Updated compilation recommendation in README. (1263)
* [Doc] Added CONTRIBUTING.md. (1241)
* [Doc] Added instructions to install parametrized package. (1164)
* [Doc] Fixed the return type for load functions. (1122)
* [Doc] Added missing modules and minor fixes. (1022, 1056, 1117)
* [Doc] Fixed spelling and links in README. (1029, 1037, 1062, 1110, 1261)
* [Doc] Grouped filtering functionals in documentation page. (1005, 1004)
* [Doc] Updated the compatibility matrix with torchaudio 0.7 (979)
* [Doc] Added description of prototype/beta/stable features. (968)

Bug Fixes

* Fixed amplitude_to_DB clamping behaviour on batches. (1113)
* Disabled audio devices in sox builds which could interfere in the build process when detected. (1153)
* Fixed COMMONVOICE for French where the audio file extension was missing on load. (1126)
* Disabled OpenMP support for libsox which can produce errors when used in DataLoader. (1026)
* Fixed noise_down_time argument in VAD by properly propagating it. (1017)
* Removed print-freq option to compute validation loss at each epoch in wav2letter pipeline. (997)
* Migrated from torch.rfft to torch.fft.rfft and cfloat following change in pytorch. (941)
* Fixed interactive ASR demo to aligned with latest version of FAIRSeq. (996)

Deprecations

* The normalized argument is unused and will be removed from griffinlim. (1036)
* The previous sox and soundfile backend remain available for one release, see 903 for details. (975)

Performance

* Added C++ lfilter core loop for faster iteration on CPU. (1244)
* Leveraged julius resampling implementation to make resampling faster. (1087)

0.7.2

Highlights

This release introduces support for python 3.9. There is no 0.7.1 release, and the following changes are compared to 0.7.0.

Improvements

* Add python 3.9 support (1061)

Bug Fixes

* Temporarily disable OpenMP support for libsox (1054)

Deprecations

* Disallow `download=True` in CommonVoice (1076)

0.7.0

Highlights

Example Pipelines

torchaudio is expanding its support for models and [end-to-end applications](https://github.com/pytorch/audio/tree/master/examples). Please file an issue on [github](https://github.com/pytorch/audio/issues/new?template=questions-help-support.md) to provide feedback on them.

* **Speech Recognition:** Building on the addition of the Wav2Letter model for speech recognition in the last release, we added a training example pipelines for speech recognition that uses the LibriSpeech dataset.
* **Text-to-Speech:** With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model. WaveRNN model is based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was introduced in "Efficient Neural Audio Synthesis". We provide an example training pipeline in the example folder that uses the LibriTTS dataset added to torchaudio in this release.
* **Source Separation:** We also support source separation with the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." An example training pipeline is provided with the wsj0-mix dataset.

I/O Improvements

As you are likely already aware from the last release we’re currently in the process of making `sox_io`, which ships with new features such as TorchScript support and performance improvements, the new default. If you want to benefit from these features now, we encourage you to migrate. For more information see issue 903.

Backwards Incompatible Changes

* Switched all %-based string formatting to `str.format` to adopt changes in PyTorch, leading to improved error messages for TorchScript (850)
* Split `sox_utils.list_formats()` for read and write (811)
* Made directory traversal order alphabetical and breadth-first, consistent across operating systems (814)
* Changed GTZAN so that it only traverses filenames belonging to the dataset (791)

New Features

* Added ConvTasNet model (920, 933) with pipeline (894)
* Added canonical pipeline with wav2letter (632)
* The WaveRNN model (705, 797, 801, 810, 836) is available with a canonical pipeline (749, 802, 831, 863)
* Added all 3 releases of tedlium dataset (882, 934, 945, 895)
* Added `VCTK_092` dataset (812)
* Added LibriTTS (790, 820)
* Added SPHERE support to `sox_io` backend (871)
* Added torchscript sox effects (760)
* Added a flag to change the interface of `soundfile` backend to the one identical to `sox_io` backend. (922)

Improvements

* Added `soundfile` compatibility backend. (922)
* Improved the speed of `torchaudio.compliance.kaldi.fbank` (947)
* Improved the speed of phaser (660)
* Added warning when a Mel filter is all zero (914)
* Added `pathlib.Path` support to `sox_io` backend (907)
* Simplified C++ registration with TORCH_LIBRARY (840)
* Merged sox effect and `sox_io` C++ implementation (779)

Internal

* CI: Added test to validate torchscript backward compatibility (838)
* CI: Used mocked datasets to test CMUArctic (829), CommonVoice (827), Speech Commands (824), LJSpeech (826), LibriSpeech (825), YESNO (792, 832)
* CI: Made *nix unit test fail if C++ extension is not available (847, 849)
* CI: Separated I/O in testing. (813, 773, 783)
* CI: Added smoke tests to `sox_io` and `sox_effects` (806)
* CI: Tested utilities have been refactored (805, 808, 809, 817, 822, 831)
* Doc: Added how to run tests (843)
* Doc: Added 0.6.0 to version matrix in README (833)

Bug Fixes

* Fixed device in interactive ASR example (900)
* Fixed incorrect extension parsing (885)
* Fixed dither with `noise_shaping = True` (865)
* Run unit test with non-editable installation (845), and set `zip_safe = False` to disable egg installation (842)
* Sorted GTZAN dataset and use on-the-fly data in GTZAN test (819)

Deprecations

* Removed `istft` wrapper in favor of [torch.istft](https://pytorch.org/docs/master/generated/torch.istft.html#torch.istft). (841)
* Deprecated `SoxEffect` and `SoxEffectsChain` (787)
* I/O: Deprecated `sox` backend. (904)
* I/O: Deprecated the current interface of `soundfile`. (922)
* I/O: Deprecated `load_wav` functions. (905)

Page 14 of 16

Releases

Has known vulnerabilities

Previous Next

Torchaudio

Page 14 of 16

0.9

0.9.0

0.8.1

0.8.0

0.7.2

0.7.0

Page 14 of 16

Links

Releases