Torchaudio

Latest version: v2.6.0

Safety actively analyzes 723158 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 16

2.2.2

This release is compatible with [PyTorch 2.2.2](https://github.com/pytorch/pytorch/releases/tag/v2.2.2) patch release. There are no new features added.

2.2.1

This release is compatible with [PyTorch 2.2.1](https://github.com/pytorch/pytorch/releases/tag/v2.2.1) patch release. There are no new features added.

2.2.0

New Features
- Add path-like object support to StreamReader/Writer https://github.com/pytorch/audio/pull/3608
- Introduce `trio` top-level module, dedicated for core I/O operations (https://github.com/pytorch/audio/pull/3676, https://github.com/pytorch/audio/pull/3680, https://github.com/pytorch/audio/pull/3681, https://github.com/pytorch/audio/pull/3682) Please refer to https://pytorch.org/audio/2.2.0/torio.html for the details.

Bug Fixes
- https://github.com/pytorch/audio/pull/3685 Make F.vad return empty tensor for zero valued tensor input

Recipe Updates
- https://github.com/pytorch/audio/pull/3631 Fix inconsistent naming

2.1.2

This is a patch release, which is compatible with [PyTorch 2.1.2](https://github.com/pytorch/pytorch/releases/tag/v2.1.2). There are no new features added.

2.1.1

This is a minor release, which is compatible with PyTorch 2.1.1 and includes bug fixes, improvements and documentation updates.

Bug Fixes

* Cherry-pick 2.1.1: Fix WavLM bundles (3665)
* Cherry-pick 2.1.1: Add back compression level in i/o dispatcher backend by (3666)

2.1

Please refer to https://pytorch.org/audio/2.1/installation.html#optional-dependencies for the detail of the new FFmpeg integration mechanism.
1. Update to libsox integration
TorchAudio now depends on libsox installed separately from torchaudio. Sox I/O backend no longer supports file-like object. (This is supported by FFmpeg backend and soundfile)
Please refer to https://pytorch.org/audio/2.1/installation.html#optional-dependencies for the detail.

New Features

I/O
- Support overwriting PTS in `torchaudio.io.StreamWriter` (3135)
- Include format information after filter `torchaudio.io.StreamReader.get_out_stream_info` (3155)
- Support CUDA frame in `torchaudio.io.StreamReader` filter graph (3183, 3479)
- Support YUV444P in GPU decoder (3199)
- Add additional filter graph processing to `torchaudio.io.StreamWriter` (3194)
- Cache and reuse HW device context in GPU decoder (3178)
- Cache and reuse HW device context in GPU encoder (3215)
- Support changing the number of channels in `torchaudio.io.StreamReader` (3216)
- Support encode spec change in `torchaudio.io.StreamWriter` (3207)
- Support encode options such as compression rate and bit rate (3179, 3203, 3224)
- Add `420p10le` support to `torchaudio.io.StreamReader` CPU decoder (3332)
- Support multiple FFmpeg versions (3464, 3476)
- Support writing opus and mp3 with soundfile (3554)
- Add switch to disable sox integration and ffmpeg integration at runtime (3500)

Ops
- Add `torchaudio.io.AudioEffector` (3163, 3372, 3374)
- Add `torchaudio.transforms.SpecAugment` (3309, 3314)
- Add `torchaudio.functional.forced_align` (3348, 3355, 3533, 3536, 3354, 3365, 3433, 3357)
- Add `torchaudio.functional.merge_tokens` (3535, 3614)
- Add `torchaudio.functional.frechet_distance` (3545)

Models
- Add `torchaudio.models.SquimObjective` for speech enhancement (3042, 3087, 3512)
- Add `torchaudio.models.SquimSubjective` for speech enhancement (3189)
- Add `torchaudio.models.decoder.CUCTCDecoder` (3096)

Pipelines
- Add `torchaudio.pipelines.SquimObjectiveBundle` for speech enhancement (3103)
- Add `torchaudio.pipelines.SquimSubjectiveBundle` for speech enhancement (3197)
- Add `torchaudio.pipelines.MMS_FA` Bundle for forced alignment (3521, 3538)

Tutorials
- Add tutorial for `torchaudio.io.AudioEffector` (3226)
- Add tutorials for CTC forced alignment API (3356, 3443, 3529, 3534, 3542, 3546, 3566)
- Add tutorial for `torchaudio.models.decoder.CUCTCDecoder` (3297)
- Add tutorial for real-time av-asr (3511)
- Add tutorial for TorchAudio-SQUIM pipelines (3279, 3313)
- Split HW acceleration tutorial into nvdec/nvenc tutorials (3483, 3478)

Recipe
- Add TCPGen context-biasing Conformer RNN-T (2890)
- Add AV-ASR recipe (3278, 3421, 3441, 3489, 3493, 3498, 3492, 3532)
- Add multi-channel DNN beamforming training recipe (3036)

Backward-incompatible changes

Third-party libraries

In this release, the following third party libraries are removed from TorchAudio binary distributions. TorchAudio now search and link these libraries at runtime. Please install them to use the corresponding APIs.

SoX

`libsox` is used for various audio I/O, filtering operations.

Pre-built binaries are avaialble via package managers, such as `conda`, `apt` and `brew`. Please refer to the respective documetation.

The APIs affected include;

- `torchaudio.load` ("sox" backend)
- `torchaudio.info` ("sox" backend)
- `torchaudio.save` ("sox" backend)
- `torchaudio.sox_effects.apply_effects_tensor`
- `torchaudio.sox_effects.apply_effects_file`
- `torchaudio.functional.apply_codec` (also deprecated, see below)

Changes related to the removal: 3232, 3246, 3497, 3035

Flashlight Text

`flashlight-text` is the core of CTC decoder.

Pre-built packages are available on PyPI. Please refer to https://github.com/flashlight/text for the detail.

The APIs affected include;

- `torchaudio.models.decoder.CTCDecoder`

Changes related to the removal: 3232, 3246, 3236, 3339

Kaldi

A custom built `libkaldi` was used to implement `torchaudio.functional.compute_kaldi_pitch`. This function, along with libkaldi integration, is removed in this release. There is no replcement.

Changes related to the removal: 3368, 3403

I/O
- Switch to the backend dispatcher (3241)

To make I/O operations more flexible, TorchAudio introduced the backend dispatcher in v2.0, and users could opt-in to use the dispatcher.
In this release, the backend dispatcher becomes the default mechanism for selecting the I/O backend.

You can pass `backend` argument to `torchaudio.info`, `torchaudio.load` and `torchaudio.save` function to select I/O backend library per-call basis. (If it is omitted, an available backend is automatically selected.)

If you want to use the global backend mechanism, you can set the environment variable, `TORCHAUDIO_USE_BACKEND_DISPATCHER=0`.
Please note, however, that this the global backend mechanism is deprecated and is going to be removed in the next release.

Please see 2950 for the detail of migration work.

- Remove Tensor binding from StreamReader (3093, 3272)

`torchaudio.io.StreamReader` accepted a byte-string wrapped in 1D `torch.Tensor` object. This is no longer supported.
Please wrap the underlying data with `io.BytesIO` instead.

- Make I/O optional arguments kw-only (3208, 3227)

The optional arguments of `add_[audio|video]_stream` methods of `torchaudio.io.StreamReader` and `torchaudio.io.StreamWriter` are now keyword-only arguments.

- Drop the support of FFmpeg < 4.1 (3561, 3557)

Previously TorchAudio supported FFmpeg 4 (>=4.1, <=4.4). In this release, TorchAudio supports FFmpeg 4, 5 and 6 (>=4.4, <7). With this change, support for FFmpeg 4.1, 4.2 and 4.3 are dropped.

Ops
- Use named file in `torchaudio.functional.apply_codec` (3397)

In previous versions, TorchAudio shipped custom built `libsox`, so that it can perform in-memory decoding and encoding.
Now, in-memory decoding and encoding are handled by FFmpeg binding, and with the switch to dynamic `libsox` linking, `torchaudio.functional.apply_codec` no longer process audio in in-memory fashion. Instead it writes to temporary file.
For in-memory processing, please use `torchaudio.io.AudioEffector`.

- Switch to `lstsq` when solving InverseMelScale (3280)

Previously, `torchaudio.transform.InverseMelScale` ran SGD optimizer to find the inverse of mel-scale transform. This approach has number of issues as listed in 2643.

This release switches to use `torch.linalg.lstsq`.

Models
- Improve RNN-T streaming decoding (3295, 3379)

The `infer` method of `torchaudio.models.RNNTBeamSearch` has been updated to accept series of previous hypotheses.

python

bundle = torchaudio.pipelines.EMFORMER_RNNT_BASE_LIBRISPEECH
decoder: RNNTBeamSearch = bundle.get_decoder()

hypothesis = None
while streaming:
...
hypo, state = decoder.infer(
features,
length,
beam_width,
state=state,
hypothesis=hypothesis,
)
...
hypothesis = hypo
Previously this had to be hypothesis = hypo[0]

Deprecations

Ops

- Update and deprecate `torchaudio.functional.apply_codec` function (3386)

Due to the removal of custom libsox binding, `torchaudio.functional.apply_codec` no longer supports in-memory processing. Please migrate to `torchaudio.io.AudioEffector`.

Please refer to for the detailed usage of `torchaudio.io.AudioEffector`.

- https://pytorch.org/audio/2.1/generated/torchaudio.io.AudioEffector.html
- https://pytorch.org/audio/stable/tutorials/effector_tutorial.html

Bug Fixes

Models
- Fix the negative sampling in ConformerWav2Vec2PretrainModel (3085)
- Fix extract_features method for WavLM models (3350)

Tutorials
- Fix backtracking in forced alignment tutorial (3440)
- Fix initialization of `get_trellis` in forced alignment tutorial (3172)

Build
- Fix MKL issue on Intel mac build (3307)

I/O
- Surpress warning when saving vorbis with sox backend (3359)
- Fix g722 encoding in `torchaudio.io.StreamWriter` (3373)
- Refactor arg mapping in ffmpeg save function (3387)
- Fix save INT16 sox backend (3524)
- Fix SoundfileBackend method decorators (3550)
- Fix PTS initialization when using NVIDIA encoder (3312)

Ops
- Add non-default CUDA device support to `lfilter` (3432)

Improvements
I/O
- Set "experimental" automatically when using native opus/vorbis encoder (3192)
- Improve the performance of NV12 frame conversion (3344)
- Improve the performance of YUV420P frame conversion (3342)
- Refactor backend implementations (3547, 3548, 3549)
- Raise an error if `torchaudio.io.StreamWriter` is not opened (3152)
- Warn if decoding YUV images with different plane size (3201)
- Expose AudioMetadata (3556)
- Refactor the internal of `torchaudio.io.StreamReader` (3157, 3170, 3186, 3184, 3188, 3320, 3296, 3328, 3419, 3209)
- Refactor the internal of `torchaudio.io.StreamWriter` (3205, 3319, 3296, 3328, 3426, 3428)
- Refactor the FFmpeg abstraction layer (3249, 3251)
- Migrate the binding of FFmpeg utils to PyBind11 (3228)
- Simplify sox namespace (3383)
- Use const reference in sox implementation (3389)
- Ensure StreamReader returns tensors with requires_grad is False (3467)
- Set the default threads to 1 in StreamWriter (3370)
- Remove ffmpeg fallback from sox_io backend (3516)

Ops
- Add arbitrary dim Tensor support to mask_along_axis{,_iid} (3289)
- Fix resampling to support dynamic input lengths for onnx exports. (3473)
- Optimize Torchaudio Vad (3382)

Documentation
- Build and use GPU-enabled FFmpeg in doc CI (3045)
- Misc tutorial update (3449)
- Update notes on FFmpeg version (3480)
- Update documentation about dependencies (3517)
- Update I/O and backend docs (3555)

Tutorials
- Update data augmentation tutorial (3375)
- Add more explanation about `n_fft` (3442)

Build
- Resolve some compilation warnings (3471)
- Use pre-built binaries for ffmpeg extension (3460)
- Add aarch64 workflow (3553)
- Add CUDA 12.1 builds (3284)
- Update CUDA to 12.1 U1 (3563)

Recipe
- Fix Adam and AdamW initializers in wav2letter example (3145)
- Update Librispeech RNNT recipe to support Lightening 2.0 (3336)
- Update HuBERT/SSL training recipes to support Lightning 2.x (3396)
- Add wav2vec2 loss function in self_supervised_learning training recipe (3090)
- Add Wav2Vec2DataModule in self_supervised_learning training recipe (3081)

Other
- Use FFmpeg6 in build doc (3475)
- Use FFmpeg6 in unit test (3570)
- Migrate `torch.norm` to `torch.linalg.vector_norm` (3522)
- Migrate `torch.nn.utils.weight_norm` to `nn.utils.parametrizations.weight_norm` (3523)

Page 6 of 16

Releases

Has known vulnerabilities

Previous Next

Torchaudio

Page 6 of 16

2.2.2

2.2.1

2.2.0

2.1.2

2.1.1

2.1

Page 6 of 16

Links

Releases