Torchaudio

Latest version: v2.6.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 16

2.1.0

Hilights

2.0.2

This is a minor release, which is compatible with PyTorch 2.0.1 and includes bug fixes, improvements and documentation updates. There is no new feature added.

Bug fix
* 3239 Properly set samples passed to encoder (3204)
* 3238 Fix virtual function issue with CTC decoder (3230)
* 3245 Fix path-like object support in FFmpeg dispatcher (3243, 3248)
* 3261 Use scaled_dot_product_attention in Wav2vec2/HuBERT's SelfAttention (3253)
* 3264 Use scaled_dot_product_attention in WavLM attention (3252, 3265)

**Full Changelog**: https://github.com/pytorch/audio/compare/v2.0.1...v2.0.2

2.0.1

Highlights

2.0

- Data augmentation operators, e.g. convolution, additive noise, speed perturbation
- WavLM and XLS-R models and pre-trained pipelines
- Backend dispatcher powering revised `info`, `load`, `save` functions
- Dropped support of Python 3.7
- Added Python 3.11 support

[Beta] Data augmentation operators
The release adds several data augmentation operators under `torchaudio.functional` and `torchaudio.transforms`:
- `torchaudio.functional.add_noise`
- `torchaudio.functional.convolve`
- `torchaudio.functional.deemphasis`
- `torchaudio.functional.fftconvolve`
- `torchaudio.functional.preemphasis`
- `torchaudio.functional.speed`
- `torchaudio.transforms.AddNoise`
- `torchaudio.transforms.Convolve`
- `torchaudio.transforms.Deemphasis`
- `torchaudio.transforms.FFTConvolve`
- `torchaudio.transforms.Preemphasis`
- `torchaudio.transforms.Speed`
- `torchaudio.transforms.SpeedPerturbation`

The operators can be used to synthetically diversify training data to improve the generalizability of downstream models.

For usage details, please refer to the documentation for [`torchaudio.functional`](https://pytorch.org/audio/2.0.0/functional.html) and [`torchaudio.transforms`](https://pytorch.org/audio/2.0.0/transforms.html), and tutorial [“Audio Data Augmentation”](https://pytorch.org/audio/2.0.0/tutorials/audio_data_augmentation_tutorial.html).

[Beta] WavLM and XLS-R models and pre-trained pipelines
The release adds two self-supervised learning models for speech and audio.
- [WavLM](https://ieeexplore.ieee.org/document/9814838) that is robust to noise and reverberation.
- [XLS-R](https://arxiv.org/abs/2111.09296) that is trained on cross-lingual datasets.

Besides the model architectures, torchaudio also supports corresponding pre-trained pipelines:
- `torchaudio.pipelines.WAVLM_BASE`
- `torchaudio.pipelines.WAVLM_BASE_PLUS`
- `torchaudio.pipelines.WAVLM_LARGE`
- `torchaudio.pipelines.WAV2VEC_XLSR_300M`
- `torchaudio.pipelines.WAV2VEC_XLSR_1B`
- `torchaudio.pipelines.WAV2VEC_XLSR_2B`

For usage details, please refer to [`factory function`](https://pytorch.org/audio/2.0.0/generated/torchaudio.models.Wav2Vec2Model.html#factory-functions) and [`pre-trained pipelines`](https://pytorch.org/audio/2.0.0/pipelines.html#id3) documentation.

Backend dispatcher
Release 2.0 introduces new versions of I/O functions `torchaudio.info`, `torchaudio.load` and `torchaudio.save`, backed by a dispatcher that allows for selecting one of backends FFmpeg, SoX, and SoundFile to use, subject to library availability. Users can enable the new logic in Release 2.0 by setting the environment variable `TORCHAUDIO_USE_BACKEND_DISPATCHER=1`; the new logic will be enabled by default in Release 2.1.

python
Fetch metadata using FFmpeg
metadata = torchaudio.info("test.wav", backend="ffmpeg")

Load audio (with no backend parameter value provided, function prioritizes using FFmpeg if it is available)
waveform, rate = torchaudio.load("test.wav")

Write audio using SoX
torchaudio.save("out.wav", waveform, rate, backend="sox")

Please see [the documentation for `torchaudio`](https://pytorch.org/audio/2.0.0/torchaudio.html#future-api) for more details.

Backward-incompatible changes
- Dropped Python 3.7 support (3020)
Following the upstream PyTorch (https://github.com/pytorch/pytorch/pull/93155), the support for Python 3.7 has been dropped.

- Default to "precise" seek in `torchaudio.io.StreamReader.seek` (2737, 2841, 2915, 2916, 2970)
Previously, the `StreamReader.seek` method seeked into a key frame closest to the given time stamp. A new option `mode` has been added which can switch the behavior to seeking into any type of frame, including non-key frames, that is closest to the given timestamp, and this behavior is now default.

- Removed deprecated/unused/undocumented functions from datasets.utils (2926, 2927)
The following functions are removed from `datasets.utils`
- `stream_url`
- `download_url`
- `validate_file`
- `extract_archive`.

Deprecations
Ops
- Deprecated 'onesided' init param for MelSpectrogram (2797, 2799)
`torchaudio.transforms.MelSpectrogram` assumes the `onesided` argument to be always `True`. The forward path fails if its value is `False`. Therefore this argument is deprecated. Users specifying this argument should stop specifying it.

- Deprecated `"sinc_interpolation"` and `"kaiser_window"` option value in favor of `"sinc_interp_hann"` and `"sinc_interp_kaiser"` (2922)
The valid values of `resampling_method` argument of resampling operations (`torchaudio.transforms.Resample` and `torchaudio.functional.resample`) are changed. `"kaiser_window"` is now `"sinc_interp_kaiser"` and `"sinc_interpolation"` is `"sinc_interp_hann"`. The old values will continue to work, but users are encouraged to update their code.
For the reason behind of this change, please refer 2891.

- Deprecated sox initialization/shutdown public API functions (3010)
`torchaudio.sox_effects.init_sox_effects` and `torchaudio.sox_effects.shutdown_sox_effects` are deprecated. They were required to use libsox-related features, but are called automatically since v0.6, and the initialization/shutdown mechanism have been moved elsewhere. These functions are now no-op. Users can simply remove the call to these functions.

Models
- Deprecated static binding of Flashlight-text based CTC decoder (3055, 3089)
Since v0.12, TorchAudio binary distributions included the CTC decoder based on flashlight-text project. In a future release, TorchAudio will switch to dynamic binding of underlying CTC decoder implementation, and stop shipping the core CTC decoder implementations. Users who would like to use the CTC decoder need to separately install the CTC decoder from the upstream flashlight-text project. Other functionalities of TorchAudio will continue to work without flashlight-text.
**Note:** The API and numerical behavior does not change.
For more detail, please refer 3088.

I/O
- Deprecated file-like object support in sox_io (3033)
As a preparation to switch to dynamically bound libsox, file-like object support in sox_io backend has been deprecated. It will be removed in 2.1 release in favor of the dispatcher. This deprecation affects the following functionalities.
* I/O: `torchaudio.load`, `torchaudio.info` and `torchaudio.save`.
* Effects: `torchaudio.sox_effects.apply_effects_file` and `torchaudio.functional.apply_codec`.
For I/O, to continue using file-like objects, please use the new dispatcher mechanism.
For effects, replacement functions will be added in the next release.
- Deprecated the use of Tensor as a container for byte string in StreamReader (3086)
`torchaudio.io.StreamReader` supports decoding media from byte strings contained in 1D tensors of `torch.uint8` type. Using torch.Tensor type as a container for byte string is now deprecated. To pass byte strings, please wrap the string with `io.BytesIO`.
<table class="tg">
<thead>
<tr>
<th class="tg-0pky">Deprecated</th>
<th class="tg-0pky">Migration</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tg-dvpl"><code>data = b"..."</code></br><code>src = torch.frombuffer(data, dtype=torch.uint8)</code></br><code>StreamReader(src)</code></td>
<td class="tg-dvpl"><code>data = b"..."</code></br><code>src = io.BytesIO(data)</code></br><code>StreamReader(src)</code></td>
</tr>
</tbody>
</table>

Bug Fixes
Ops
- Fixed contiguous error when backpropagating through `torchaudio.functional.lfilter` (3080)

Pipelines
- Added layer normalization to wav2vec2 large+ pretrained models (2873)
In self-supervised learning models such as Wav2Vec 2.0, HuBERT, or WavLM, layer normalization should be applied to waveforms if the convolutional feature extraction module uses layer normalization and is trained on a large-scale dataset. After adding layer normalization to those affected models, the Word Error Rate is significantly reduced.

Without the change in 2873, the WER results are:
| Model | dev-clean | dev-other | test-clean | test-other |
|:------------------------------------------------------------------------------------------------|-----------:|-----------:|-----------:|-----------:|

1.37

</td>
</tr>
<tr>
<td>0.7
</td>
<td><p style="text-align: right">

1.9.1

No functional changes other than minor updates to CI rules.

Page 7 of 16

Releases

Has known vulnerabilities

Previous Next

Torchaudio

Page 7 of 16

2.1.0

2.0.2

2.0.1

2.0

1.37

1.9.1

Page 7 of 16

Links

Releases