Torchaudio

Latest version: v2.5.1

Safety actively analyzes 688532 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 16

0.9.0

- Lots of performance improvements. (filtering, resampling, spectral operation)
- Popular wav2vec2.0 model architecture.
- Improved autograd support.


[Beta] Wav2Vec2.0 Model

This release includes model architectures from [wav2vec2.0](https://arxiv.org/abs/2006.11477) paper with utility functions that allow importing pretrained model parameters published on <code>[fairseq](https://github.com/pytorch/fairseq/tree/master/examples/wav2vec)</code> and [Hugging Face Hub](https://huggingface.co/models?filter=wav2vec2). Now you can easily run speech recognition with torchaudio. These model architectures also support TorchScript, and you can deploy them with ONNX or in non-Python environments, such as C++, Android and iOS. Please checkout our [C++](https://github.com/pytorch/audio/tree/master/examples/libtorchaudio), [Android](https://github.com/pytorch/android-demo-app/tree/master/SpeechRecognition) and [iOS](https://github.com/pytorch/ios-demo-app/tree/master/SpeechRecognition) examples. The following snippets illustrate how to create a deployable model.


python
Import fine-tuned model from Hugging Face Hub
import transformers
from torchaudio.models.wav2vec2.utils import import_huggingface_model

original = Wav2Vec2ForCTC.from_pretrained("facebook/wav2vec2-base-960h")
imported = import_huggingface_model(original)


python
Import fine-tuned model from fairseq
import fairseq
from torchaudio.models.wav2vec2.utils import import_fairseq_model

Original, _, _ = fairseq.checkpoint_utils.load_model_ensemble_and_task(
["wav2vec_small_960h.pt"], arg_overrides={'data': "<data_dir>"})
imported = import_fairseq_model(original[0].w2v_encoder)


python
Build uninitialized model and load state dict
from torchaudio.models import wav2vec2_base

model = wav2vec2_base(num_out=32)
model.load_state_dict(imported.state_dict())

Quantize / script / optimize for mobile
quantized_model = torch.quantization.quantize_dynamic(
model, qconfig_spec={torch.nn.Linear}, dtype=torch.qint8)
scripted_model = torch.jit.script(quantized_model)
optimized_model = optimize_for_mobile(scripted_model)
optimized_model.save("model_for_deployment.pt")


Filtering Improvement

The internal implementation of `lfilter` has been updated to support autograd on both CPU and CUDA. Additionally, the performance on CPU is significantly improved. These improvements also apply to `biquad` variants.

The following table illustrates the performance improvements compared against the previous releases. `lfilter` was applied on `float32` tensors with one channel and different number of frames.


<table>
<tr>
<td>torchaudio version
</td>
<td><p style="text-align: right">256</p>
</td>
<td><p style="text-align: right">512</p>
</td>
<td><p style="text-align: right">1024</p>
</td>
</tr>
<tr>
<td>0.9
</td>
<td><p style="text-align: right">

0.8.1

Highlights

This release depends on pytorch 1.8.1.

Bug Fixes

* Added back support for 24-bit signed LPCM wav via sox_io backend. (1389)

0.8.0

Highlights

This release supports Python 3.9.

I/O Improvements

Continuing from the previous release, torchaudio improves the audio I/O mechanism. In this release, we have four major updates.

1. Backend migration.
We have migrated the default backend for audio I/O. The new default backend is “sox_io” (for Linux/macOS). The interface for “soundfile” backend has been also changed to align that of “sox_io”. Following the change of default backends, the legacy backend/interface have been marked as deprecated. The legacy backend/interface are still accessible, though it is strongly discouraged to use them. For the detail on the migration, please refer to 903.

1. File-like object support.
We have added file-like object support to I/O functions and sox_effects. You can perform the `info`, `load`, `save` and `apply_effects_file` operation on file-like objects.
python
Query audio metadata over HTTP
Will only fetch the first few kB
with requests.get(URL, stream=True) as response:
metadata = torchaudio.info(response.raw)

Load audio from tar file
No need to extract TAR file.
with tarfile.open(TAR_PATH, mode='r') as tarfile_:
fileobj = tarfile_.extractfile(SAMPLE_TAR_ITEM)
waveform, sample_rate = torchaudio.load(fileobj)

Saving to Bytes buffer
Using BytesIO, you can perform in-memory encoding/decoding.
buffer_ = io.BytesIO()
torchaudio.save(buffer_, waveform, sample_rate, format="wav")

Apply effects (lowpass filter / resampling) while loading audio from S3
client = boto3.client('s3')
response = client.get_object(Bucket=S3_BUCKET, Key=S3_KEY)
waveform, sample_rate = torchaudio.sox_effects.apply_effect_file(
response['Body'], [["lowpass", "-1", "300"], ["rate", "8000"]])

1. [Beta] Codec Application.
Built upon the file-like object support, we added `functional.apply_codec` function, which can degrades audio data by applying audio codecs supported by “sox_io” backend, in in-memory fashion.
python
Apply MP3 codec
degraded = F.apply_codec(
waveform, sample_rate, format="mp3", compression=-9)
Apply GSM codec
degraded = F.apply_codec(waveform, sample_rate, format="gsm")

1. Encoding options.
We have added encoding options to save function of new backends. Now you can change the format and encodings with `format`, `encoding` and `bits_per_sample` options
python
Save without any encoding option.
The function will pick the encoding which the provided data fit
For Tensor of float32 type, that is 32-bit floating-point PCM.
torchaudio.save("data.wav", waveform, sample_rate)

Save as 16-bit signed integer Linear PCM
The resulting file occupies half the storage but loses precision
torchaudio.save(
"data.wav", waveform, sample_rate, encoding="PCM_S", bits_per_sample=16)

1. More format support to "sox_io"’s save function.
We have added support for GSM, HTK, AMB, and AMR-NB formats to "sox_io"’s save function.

Switch to CMake-based build

torchaudio was utilizing CMake to build third party dependencies. Now torchaudio uses CMake to build its C++ extension. This will open the door to integrate torchaudio in non-Python environments (such as C++ applications and mobile). We will work on adding example applications and mobile integrations in upcoming releases.

Backwards Incompatible Changes

* Removed deprecated transform and target_transform arguments from VCTK and YESNO datasets. (1120) If you were relying on the previous behavior, we recommend that you apply the transforms in the collate function.
* Removed torchaudio.datasets.utils.walk_files (1111) and replaced by Path and glob. (1069, 1101). If you relied on the function, we recommend that you use glob instead.
* Removed torchaudio.data.utils.unicode_csv_reader. (1086) If you relied on the function, we recommend that you replace by csv.reader.
* Disabled CommonVoice download as users are required to sign user agreement. Please download and extract the dataset manually, and replace the root argument by the subfolder for the version and language of interest, see 1082 for more details. (1018, 1079, 1080, 1082)
* Removed legacy sox effects (977, 1001). Please migrate to apply_effects_file or apply_effects_tensor.
* Switched the default backend to the ones with new interfaces (978). If you were relying on the previous behavior, you can return to the previous behavior by following instructions in 975 for one more release.

New Features

* Added GSM, HTK, AMB, AMR-NB and AMR-WB format support to “sox_io” backend. (1276, 1291, 1277, 1275, 1066)
* Added encoding options (format, bits_per_sample and encoding) to save function. (1226, 1177, 1129, 1104)
* Added new attributes (bits_per_sample and encoding) to the info function return type (AudioMetaData) (1177, 1206, 1324)
* Added format override to libsox-based file input. (load, info, sox_effects.apply_effects_file) (1104)
* Added file-like object support in “sox_io”, and “soundfile” backend and sox_effects.apply_effects_file. (1115)
* [Beta] Added the Kaldi Pitch feature. (1243, 1260)
* [Beta] Added the SpectralCentroid transform. (1167, 1216, 1316)
* [Beta] Added codec transformation apply_codec. (1200)

Improvements

* Exposed normalization method to Mel transforms. (1212)
* Exposed additional STFT arguments to Spectrogram (892) and to MelSpectrogram (1211).
* Added support for pathlib.Path to apply_effects_file (1048) and to CMUARCTIC (1025), YESNO (1015), COMMONVOICE (1027), VCTK and LJSPEECH (1028), GTZAN (1032), SPEECHCOMMANDS (1039), TEDLIUM (1045), LIBRITTS and LIBRISPEECH (1046).
* Added SpeechCommands train/valid/test split. (966, 1012)

Internals

* Replaced if-elseif-else with switch in sox C++ code. (1270)
* Refactored C++ interface for sox_io's get_info_file (1232) and get_encodinginfo (1233).
* Add explicit functional import in init. (1228)
* Refactored YESNO dataset (1127), LJSPEECH dataset (1143).
* Removed Python 2.7 reference from setup.py. (1182)
* Merged flake8 configurations into single .flake8 file. (1172, 1214)
* Updated calls to torch.stft to use return_complex=True. (1096, 1013)
* Cleaned up handling of optional args in C++ with c10:optional. (1043)
* Removed unused imports in sox effects. (1052)
* Introduced functional submodule to organize functionals. (1003)
* [Testing] Refactored MelSpectrogram librosa compatibility test to decouple from other tests. (1267)
* [Testing] Moved batch tests for functionals. (1254)
* [Testing] Refactored tests for backend (1239) and for functionals (1237).
* [Testing] Removed dependency on pytest from testing (1157, 1188)
* [Testing] Refactored unitests for VCTK (1134), SPEECHCOMMANDS (1136), LIBRISPEECH (1140), TEDLIUM (1135), LJSPEECH (1138), LIBRITTS (1139), CMUARCTIC (1147), GTZAN(1148), COMMONVOICE and YESNO (1133).
* [Testing] Removed dependency on COMMONVOICE dataset from tests. (1132)
* [Build] Fixed Python 3.9 support (1242)
* [Build] Switched to cmake for build. (1187, 1246, 1249)
* [Build] Restructured C++ code to allow per file registration of custom ops. (1221)
* [Build] Added logging to sox/CMakeLists.txt. (1190)
* [Build] Disabled C++11 ABI when necessary for libtorch compatibility. (880)
* [Build] Reorganized libsox source and build directory to accommodate additional third party code. (1161, 1176)
* [Build] Refactored sox source files and moved into dedicated subfolder. (1106)
* [Build] Enabled custom clean function for python setup.py clean. (1142)
* [CI] Documented undocumented parameters. Added CI check. (1248)
* [CI] Fixed sphinx warnings in documentation. Turned warnings into errors. (1247)
* [CI] Print CPU info before running unit test. (1218)
* [CI] Fixed clang-format job and fixed newly detected formatting issues. (981, 1198, 1222)
* [CI] Updated unit test base Docker image. (1193)
* [CI] Disabled CCI cache which is now known to be flaky. (1189)
* [CI] Disabled torchscript BC test which is known to fail. (1192)
* [CI] Stripped version suffix for pytorch. (1185)
* [CI] Ran smoke test with CPU package for pytorch due to known issue with CUDA 11. (1105)
* [CI] Added missing empty line at the end of config.yml. (1020)
* [CI] Added automatic documentation build and push to branch in CI. (1006, 1034, 1041, 1049, 1091, 1093, 1098, 1100, 1121)
* [CI] Ran GPU test for all pull requests and fixed current setup. (998, 1014, 1191)
* [CI] Skipped tests that is known to fail on macOS Python 3.6/3.7. (999)
* [CI] Changed the order of installation and aligned with Windows. (987)
* [CI] Fixed documentation rendering by using Sphinx 2.4.4. (974)
* [Doc] Added subcategories to functional documentation. (1325)
* [Doc] Added a version selector in documentation. (1273)
* [Doc] Updated compilation recommendation in README. (1263)
* [Doc] Added CONTRIBUTING.md. (1241)
* [Doc] Added instructions to install parametrized package. (1164)
* [Doc] Fixed the return type for load functions. (1122)
* [Doc] Added missing modules and minor fixes. (1022, 1056, 1117)
* [Doc] Fixed spelling and links in README. (1029, 1037, 1062, 1110, 1261)
* [Doc] Grouped filtering functionals in documentation page. (1005, 1004)
* [Doc] Updated the compatibility matrix with torchaudio 0.7 (979)
* [Doc] Added description of prototype/beta/stable features. (968)

Bug Fixes

* Fixed amplitude_to_DB clamping behaviour on batches. (1113)
* Disabled audio devices in sox builds which could interfere in the build process when detected. (1153)
* Fixed COMMONVOICE for French where the audio file extension was missing on load. (1126)
* Disabled OpenMP support for libsox which can produce errors when used in DataLoader. (1026)
* Fixed noise_down_time argument in VAD by properly propagating it. (1017)
* Removed print-freq option to compute validation loss at each epoch in wav2letter pipeline. (997)
* Migrated from torch.rfft to torch.fft.rfft and cfloat following change in pytorch. (941)
* Fixed interactive ASR demo to aligned with latest version of FAIRSeq. (996)

Deprecations

* The normalized argument is unused and will be removed from griffinlim. (1036)
* The previous sox and soundfile backend remain available for one release, see 903 for details. (975)

Performance

* Added C++ lfilter core loop for faster iteration on CPU. (1244)
* Leveraged julius resampling implementation to make resampling faster. (1087)

0.7.2

Highlights

This release introduces support for python 3.9. There is no 0.7.1 release, and the following changes are compared to 0.7.0.

Improvements

* Add python 3.9 support (1061)

Bug Fixes

* Temporarily disable OpenMP support for libsox (1054)

Deprecations

* Disallow `download=True` in CommonVoice (1076)

0.7.0

Highlights

Example Pipelines

torchaudio is expanding its support for models and [end-to-end applications](https://github.com/pytorch/audio/tree/master/examples). Please file an issue on [github](https://github.com/pytorch/audio/issues/new?template=questions-help-support.md) to provide feedback on them.

* **Speech Recognition:** Building on the addition of the Wav2Letter model for speech recognition in the last release, we added a training example pipelines for speech recognition that uses the LibriSpeech dataset.
* **Text-to-Speech:** With the goal of supporting text-to-speech applications, we added a vocoder based on the WaveRNN model. WaveRNN model is based on the implementation from [this repository](https://github.com/fatchord/WaveRNN). The original implementation was introduced in "Efficient Neural Audio Synthesis". We provide an example training pipeline in the example folder that uses the LibriTTS dataset added to torchaudio in this release.
* **Source Separation:** We also support source separation with the addition of the ConvTasNet model, based on the paper "Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation." An example training pipeline is provided with the wsj0-mix dataset.

I/O Improvements

As you are likely already aware from the last release we’re currently in the process of making `sox_io`, which ships with new features such as TorchScript support and performance improvements, the new default. If you want to benefit from these features now, we encourage you to migrate. For more information see issue 903.

Backwards Incompatible Changes

* Switched all %-based string formatting to `str.format` to adopt changes in PyTorch, leading to improved error messages for TorchScript (850)
* Split `sox_utils.list_formats()` for read and write (811)
* Made directory traversal order alphabetical and breadth-first, consistent across operating systems (814)
* Changed GTZAN so that it only traverses filenames belonging to the dataset (791)

New Features

* Added ConvTasNet model (920, 933) with pipeline (894)
* Added canonical pipeline with wav2letter (632)
* The WaveRNN model (705, 797, 801, 810, 836) is available with a canonical pipeline (749, 802, 831, 863)
* Added all 3 releases of tedlium dataset (882, 934, 945, 895)
* Added `VCTK_092` dataset (812)
* Added LibriTTS (790, 820)
* Added SPHERE support to `sox_io` backend (871)
* Added torchscript sox effects (760)
* Added a flag to change the interface of `soundfile` backend to the one identical to `sox_io` backend. (922)

Improvements

* Added `soundfile` compatibility backend. (922)
* Improved the speed of `torchaudio.compliance.kaldi.fbank` (947)
* Improved the speed of phaser (660)
* Added warning when a Mel filter is all zero (914)
* Added `pathlib.Path` support to `sox_io` backend (907)
* Simplified C++ registration with TORCH_LIBRARY (840)
* Merged sox effect and `sox_io` C++ implementation (779)

Internal

* CI: Added test to validate torchscript backward compatibility (838)
* CI: Used mocked datasets to test CMUArctic (829), CommonVoice (827), Speech Commands (824), LJSpeech (826), LibriSpeech (825), YESNO (792, 832)
* CI: Made *nix unit test fail if C++ extension is not available (847, 849)
* CI: Separated I/O in testing. (813, 773, 783)
* CI: Added smoke tests to `sox_io` and `sox_effects` (806)
* CI: Tested utilities have been refactored (805, 808, 809, 817, 822, 831)
* Doc: Added how to run tests (843)
* Doc: Added 0.6.0 to version matrix in README (833)

Bug Fixes

* Fixed device in interactive ASR example (900)
* Fixed incorrect extension parsing (885)
* Fixed dither with `noise_shaping = True` (865)
* Run unit test with non-editable installation (845), and set `zip_safe = False` to disable egg installation (842)
* Sorted GTZAN dataset and use on-the-fly data in GTZAN test (819)

Deprecations

* Removed `istft` wrapper in favor of [torch.istft](https://pytorch.org/docs/master/generated/torch.istft.html#torch.istft). (841)
* Deprecated `SoxEffect` and `SoxEffectsChain` (787)
* I/O: Deprecated `sox` backend. (904)
* I/O: Deprecated the current interface of `soundfile`. (922)
* I/O: Deprecated `load_wav` functions. (905)

0.6.0

Highlights

torchaudio now includes a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for torchscript. torchaudio now also supports Windows, with the soundfile backend.

torchaudio requires python 3.6 or more recent.

Backwards Incompatible Changes

* We reorganized the C++ resources (630) and replaced C++ bindings for sox_effects init/list/shutdown with torch binding (748).
* We removed code specific to python 2 (691), and we no longer tests against python 2 (575) and 3.5 (577)

New Features

* We now support Windows. (604, 637, 642, 655, 743)
* We now have a model module which includes wav2letter. (462, 722)
* We added the GTZAN and CMU datasets. (668, 710)
* We now have the contrast functional (551), cvm (540), dcshift (558), overdrive (569), vad (578, 599), phaser (587, 607, 702), flanger (651, 702), biquad (661).
* We added a new sox_io backend (718, 728, 734, 727, 763, 752, 731, 732, 726, 780) that is compatible with torchscript with a new AudioMetaData class (761).
* MelSpectrogram now has power and normalized parameters (633), and slaney normalization (589, 641).
* lfilter now has a clamp option. (600)
* Griffin-Lim can now have zero momentum. (601)
* sliding_window_cmn now supports batching. (570)
* Downloaded datasets now verify checksums. (499)

Improvements

* We added ogg/vorbis/opus support to binary distribution (750, 755).
* We replaced the use of torch.norm in spectrogram to improve performance (747).
* We now use fused operations in lfilter for faster computation. (517, 564)
* STFT is now called directly from torchaudio. (531)
* We redesigned the backend mechanism to support torchscript, by restructuring the code (695, 696, 700, 706, 707, 698), adding dynamic listing (697)
* torchaudio can be built along with sox, or can use external sox. (625, 669, 739)
* We redesigned the sox_effects module. (708)
* We added more details to compilation instructions. (667)
* We updated the README with instructions on changing the backend. (553)
* We now have a version compatibility matrix in README. (685)
* We now use cmake to build third party libraries (753).
* We now use CircleCI instead of travis (576, 584, 598, 603, 636, 738) and we test on GPU (586, 777).
* We run the test suite against nightlies. (538, 678)
* We redesigned our test suite: with new helper functions (514, 519, 521, 565, 616, 690, 692, 694), standard pytorch test utilities (513, 640, 643, 645, 646, 652, 650, 712), separated CPU and GPU tests (513, 528, 644), more descriptive names (532), clearer organization (539, 541, 542, 664, 672, 687, 703, 716, 732), standardized name (559), and backend aware (719). This is detailed in a new README for testing (566, 759).
* We now support typing, for datasets (511, 522), for backends (527), for init (526), and inline (530), with mypy configuration (524, 544, 590).

Bug Fixes

* We removed in place operations so that Griffin-Lim can be backpropagated through. (730)
* We fixed kaldi MFCC on GPU. (681)
* We removed multiple definitions of SoxEffect in C++. (635)
* We fixed the docstring of masking. (612)
* We replaced views by reshape for batching. (594)
* We fixed missing conda environment when testing in python 3.8. (582)
* We ensure that sox is not exposed in windows. (579)
* We corrected the instructions to install nightlies. (547, 552)
* We fix the seed of mask_along_iid. (529)
* We correctly report GPU tests as skipped instead of passed. (516)

Deprecations

* Since sox_effects is now automatically initialized and shutdown (572, 693), we are deprecating these functions (709).
* ISTFT is migrating to torch. (523)

Page 14 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.