</td>
</tr>
</table>
Unit: msec
Improved Windows Support
torchaudio implements some operations in C++ for reasons such as performance and integration with third-party libraries. This C++ module was only available on Linux and macOS. In this release, Windows packages also come with C++ module.
This C++ module in Windows package includes the efficient filtering implementation mentioned above, however, `“sox_io”` backend and `torchaudio.functional.compute_kaldi_pitch` are not included.
I/O Functions Migration
Since the 0.6 release, we have continuously improved I/O functionality. Specifically, in 0.8 the default backend has been changed from `“sox”` to `“sox_io”`, and the similar API change has been applied to `“soundfile”` backend. The 0.9 release concludes this migration by removing the deprecated backends. For the detail please refer to [903](https://github.com/pytorch/audio/issues/903).
Backward Incompatible Changes
I/O
* Deprecated backends and functions were removed (1311, 1329, 1362)
* Please see 903 for the migration.
* Added validation of the number of channels when saving GSM (1384)
* Please make sure that signal has only one channel when saving into GSM.
Ops
* Removed deprecated `normalized` argument from `torchaudio.functional.griffinlim` (1369)
* This argument was never used. Please remove the argument from your call.
* Renamed `torchaudio.functional.sliding_window_cmn` arg for correctness (1347)
* The first argument is supposed to spectrogram. If you have used keyword argument `waveform=...`, please change it to `specgram=...`
* Changed `torchaudio.transforms.Resample` to precompute and cache the resampling kernel. (1499, 1514)
* To use the transform in devices other than CPU, please move the instantiated object to the target device.
python
resampler = torchaudio.transforms.Resample(orig_freq=8000, new_freq=44100)
resampler.to(torch.device("cuda"))
Dataset
* Removed deprecated arguments from CommonVoice (1534)
* `torchaudio` no longer supports programmatic download of Common Voice dataset. Please remove the arguments from your code.
Deprecations
* Deprecated the use of pseudo complex type (1445, 1492)
* `torchaudio` is adopting native complex type and the use of pseudo complex type and the related utility functions are now deprecated. Please refer to 1337 for the migration process.
* Deprecated `torchaudio.compliance.kaldi.resample_waveform` (1533)
* Please use `torchaudio.functional.resample`.
* `torchaudio.transforms.MelScale` now expects valid `n_stft` value (1515)
* Please provide a valid value to `n_stft`.
New Features
[Beta] Wav2Vec2.0
* Added wav2vec2.0 model (1529)
* Added wav2vec2.0 HuggingFace importer (1530)
* Added wav2vec2.0 fairseq importer (1531)
* Added speech recognition C++ example (1538)
* Please refer to [C++ example](https://github.com/pytorch/audio/tree/master/examples/libtorchaudio/speech_recognition) for the detail.
Filtering
* Added C++ implementation of `torchaudio.functional.lfilter` (1319)
* Added autograd support to `torchaudio.functional.lfilter` (1310, 1441)
[Beta] Resampling
* Added `torchaudio.functional.resample` (1402)
* Added `rolloff` parameter (1488)
* Added kaiser window support to resampling (1509)
* Added kernel caching mechanism in `torchaudio.transforms.Resample` (1499, 1514, 1556)
* Skip resampling when sampling rate is not changed (1537)
Native Complex Tensor
* Added complex tensor support to `torchaudio.functional.phase_vocoder` and `torchaudio.transforms.TimeStretch` (1410)
* Added `return_complex` to `torchaudio.functional.spectrogram` and `torchaudio.transforms.Spectrogram` (1366, 1551)
Improvements
I/O
* Added file path to I/O error messages (1523)
* Added `__str__` override to `AudioMetaData` for easy print (1339)
* Fixed uninitialized variable in `sox/utils.cpp` (1306)
* Replaced UB sox conversion macros with tensor op (1370)
* Removed `check_length` from `validate_input_file` (1312)
Ops
* Added warning for non-integer resampling frequencies (1490)
* Adopted native complex tensors in `torchaudio.functional.griffinlim` (1368)
* Prohibited scripting `torchaudio.transforms.MelScale` when `n_stft` is invalid (1505)
* Added input dimension check to VAD (1513)
* Added HTK-compatible option to Mel-scale conversion (593)
Models
* Added vanilla DeepSpeech model (1399)
Datasets
* Fixed checksum for the YESNO dataset (1405)
Misc
* Added missing transforms to `__all__` (1458)
* Removed `reference_cast` in `make_boxed_from_unboxed_functor` (1300)
* Removed unused normalized constant from `torchaudio.transforms.GriffinLim` (1433)
* Removed unused helper function (1396)
Examples
* Added libtorchaudio C++ example (1349)
* Refactored libtorchaudio example (1486)
* Replaced `librosa`'s Mel scale conversion with `torchaudio`’s in WaveRNN example (1444)
Build
* Updated `config.guess` to support source build in recent architectures (1484)
* Explicitly disabled wavpack when building SoX (1462)
* Added ROCm support to source build (1411)
* Added Windows C++ binary build (1345, 1371)
* Made kaldi selective in build (1342)
* Made sox selective (1338)
Testing
* Added autograd test for `torchaudio.functional.lfilter` and `biquad` variants (1400, 1438)
* Added autograd test for transforms (overview: 1414)
* `torchaudio.transforms.FrequencyMasking` (1498)
* `torchaudio.transforms.SlidingWindowCmn` (1482)
* `torchaudio.transforms.MelScale` (1467)
* `torchaudio.transforms.Vol` (1460)
* `torchaudio.transforms.TimeStretch` (1420)
* `torchaudio.transforms.AmplitudeToDB` (1447)
* `torchaudio.transforms.GriffinLim` (1421)
* `torchaudio.transforms.SpectralCentroid` (1425)
* `torchaudio.transforms.ComputeDeltas` (1422)
* `torchaudio.transforms.Fade` (1424)
* `torchaudio.transforms.Resample` (1416)
* `torchaudio.transforms.MFCC` (1415)
* `torchaudio.transforms.Spectrogram` / `MelSpectrogram` (1340)
* Added test for a batch of different items in the functional batch consistency test. (1315)
* Added test for validating `torchaudio.functional.lfilter` shape (1360)
* Added TorchScript test for `torchaudio.functional.resample` (1516)
* Added TorchScript test for `torchaudio.functional.phase_vocoder` (1379)
* Added steps to save and load the scripted object in TorchScript (1446)
* Added GPU support to functional tests (1475)
* Added GPU support to transform librosa compatibility test (1439)
* Added GPU support to functional librosa compatibility test (1436)
* Improved HTTP fetch test reliability (1512)
* Refactored functional batch consistency test (1341)
* Refactored test classes for complex (1491)
* Refactored sox_io load test (1394)
* Refactored Kaldi compatibility tests (1359)
* Refactored functional test (1435, 1463)
* Refactored transform tests (1356)
* Refactored librosa compatibility test (1350)
* Refactored sox compatibility test (1344)
* Refactored librosa compatibility test (1259)
* Removed the use I/O functions in batch consistency test (1521)
* Removed skipIfNoSoxBackend (1390)
* Removed VAD from batch consistency tests (1451)
* Replaced deprecated `floor_divide` with `div` (1455)
* Replaced `torch.assert_allclose` with `assertEqual` (1387)
* Shortened `torchaudio.functional.lfilter` autograd tests input size (1443)
* Updated `torchaudio.transforms.InverseMelScale` comparison test (1437)
Bug Fixes
* Updated `torchaudio.transforms.TimeMasking` and `torchaudio.transforms.FrequencyMasking` to perform out-of-place masking (1481)
* Annotate `power` of `torchaudio.transforms.MelSpectrogram` as float only (1572)
Performance
* Adopted `torch.nn.functional.conv1d` in `torchaudio.functional.lfilter` (1318)
* Added C++ implementation of `torchaudio.functional.overdrive` (1299)
Documentation
* Update docs (1550)
* Reformat resample docs (1548)
* Updated resampling documentation (1519)
* Added the clarification that `sox_effects.apply_effects_tensor` is CPU-only (1459)
* Removed instructions on using external sox (1365, 1281)
* Added navigation with left/right arrow keys (1336)
* Fixed docstring of `sliding_window_cmn` (1383)
* Update contributing guide (1372)
* Fix broken links in contribution guide (1361)
* Added Windows build instructions (1440)
* Fixed typo (1471, 1397, 1293)
* Added WER to readme in wav2letter pipeline (1470)
* Fixed wav2letter usage example (1060)
* Added Google Analytics support (1466)