Torchaudio

Latest version: v2.5.1

Safety actively analyzes 688554 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 16 of 16

0.3.0

Highlights
torchaudio as an extension of PyTorch
torchaudio has been redesigned to be an extension of PyTorch and part of the domain APIs (DAPI) ecosystem. Domain specific libraries such as this one are kept separated in order to maintain a coherent environment for each of them. As such, torchaudio is an ML library that provides relevant signal processing functionality, but it is not a general signal processing library. The full rationale of this new standardization can be found in the [README.md](https://github.com/pytorch/audio/blob/v0.3.0/README.md#Conventions).

In light of these changes some transforms have been removed or have different argument names and conventions. See the section on backwards breaking changes for a migration guide.

We provide binaries via [pip](https://pypi.org/project/torchaudio/) and [conda](https://anaconda.org/pytorch/torchaudio). They require PyTorch 1.2.0 and newer. See https://pytorch.org/ for installation instructions.

Community

We would like to thank our contributors and the wider community for their significant contributions to this release. We are happy to see an active community around torchaudio and are eager to further grow and support it.

In particular we'd like to thank keunwoochoi, ksanjeevan, and all the other maintainers and contributors of [torchaudio-contrib](https://github.com/keunwoochoi/torchaudio-contrib) for their significant and valuable additions around standardization and the support of complex numbers (https://github.com/pytorch/audio/pull/131, https://github.com/pytorch/audio/issues/110, https://github.com/keunwoochoi/torchaudio-contrib/issues/61, https://github.com/keunwoochoi/torchaudio-contrib/issues/36).

Kaldi Compliance Interface
An implementation of basic transforms with a Kaldi-like interface.

We added the functions [spectrogram](https://pytorch.org/audio/compliance.kaldi.html#torchaudio.compliance.kaldi.spectrogram), [fbank](https://pytorch.org/audio/compliance.kaldi.html#torchaudio.compliance.kaldi.fbank), and [resample_waveform](https://pytorch.org/audio/compliance.kaldi.html#torchaudio.compliance.kaldi.resample_waveform) (https://github.com/pytorch/audio/pull/119, https://github.com/pytorch/audio/pull/127, and https://github.com/pytorch/audio/pull/134). For more details see the documentation on [torchaudio.compliance.kaldi](https://pytorch.org/audio/compliance.kaldi.html) which mirrors the arguments and outputs of [Kaldi features](https://github.com/kaldi-asr/kaldi/tree/master/src/featbin).

As an example we can look at the sinc interpolation resampling similar to Kaldi’s implementation. In the figure below, the blue dots are the original signal and red dots are the downsampled signal with half the original frequency. The red dot elements are approximately every other original element.

![resampling](https://user-images.githubusercontent.com/10252970/61245365-4e877800-a71a-11e9-8171-5294253eae2c.png "Example of Resampling")

python
specgram = torchaudio.compliance.kaldi.spectrogram(waveform, frame_length=...)
fbank = torchaudio.compliance.kaldi.fbank(waveform, num_mel_bins=...)
resampled_waveform = torchaudio.compliance.kaldi.resample_waveform(waveform, orig_freq=...)


Inverse short time Fourier transform
Constructing a signal from a spectrogram can be used in applications like source separation or to generate audio signals to listen to. More specifically [torchaudio.functional.istft](https://pytorch.org/audio/functional.html#torchaudio.functional.istft) is the inverse of [torch.stft](https://pytorch.org/docs/stable/torch.html#torch.stft). It has the same parameters (+ additional optional parameter of `length`) and returns the least squares estimation of an original signal.

python
torch.manual_seed(0)
n_fft = 5
waveform = torch.rand(2, 5)
stft = torch.stft(waveform, n_fft=n_fft)
approx_waveform = torchaudio.functional.istft(stft, n_fft=n_fft, length=waveform.size(1))
>>> waveform

0.2.0

Background
The goal of this release is to fix the current API as there will be future changes that breaking backward compatibility in order to improve the library as more thought is given to design, capabilities, and usability.

While this release is compatible with all currently known PyTorch versions (<=1.2.0), the available binaries will only require Pytorch 1.1.0. Installation commands:

bash
Wheels for Python 2 are NOT supported
Python 3.5
$ pip3 install http://download.pytorch.org/whl/torchaudio-0.2-cp35-cp35m-linux_x86_64.whl
Python 3.6
$ pip3 install http://download.pytorch.org/whl/torchaudio-0.2-cp36-cp36m-linux_x86_64.whl
Python 3.7
$ pip3 install http://download.pytorch.org/whl/torchaudio-0.2-cp37-cp37m-linux_x86_64.whl


What's new?
- Fixed broken tests and setup automatic testing environment
- Read in Kaldi files (“.ark”, “.scp”)
- Separation of state and computation into [transforms.py](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/transforms.py) and [functional.py](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/functional.py)
- Loading and saving to file
- Datasets [VCTK](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/datasets/vctk.py) and [YESNO](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/datasets/yesno.py)
- SoxEffects and SoxEffectsChain in [torchaudio.sox_effects](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/sox_effects.py)

CI and Testing
A continuous integration (Travis CI) has been setup in https://github.com/pytorch/audio/pull/117. This means all the tests have been fixed and their status can be checked in https://travis-ci.org/pytorch/audio. The test files have to be run separately via [build_tools/travis/test_script.sh](https://github.com/pytorch/audio/blob/v0.2.0/build_tools/travis/test_script.sh) because closing sox after a test file is completed prevents it from being reopened. The testing framework is [pytest](https://docs.pytest.org/en/latest/).

bash
Run the whole test suite
$ build_tools/travis/test_script.sh
Run an individual test
$ python -m pytest test/test_transforms.py


Kaldi IO
[Kaldi IO](https://github.com/vesis84/kaldi-io-for-python) has been added as an optional dependency in https://github.com/pytorch/audio/pull/111. torchaudio provides a simple wrapper around this by converting the `np.ndarray` into `torch.Tensor`. Functions include: `read_vec_int_ark`, `read_vec_flt_scp`, `read_vec_flt_ark`, `read_mat_scp`, and `read_mat_ark`.

python
>>> read ark to a 'dictionary'
>>> d = { u:d for u,d in torchaudio.kaldi_io.read_vec_int_ark(file) }


Separation of State and Computation
In https://github.com/pytorch/audio/pull/105, the computations have been moved into functional.py. The reasoning behind this is that tracking state is a separate problem by itself and should be separate from computing a function. It also allows us to annotate the functional as weak scriptable, which in turn allows us to utilize the JIT and create efficient code. The functional itself might then also be used by other functionals, which is much easier and more efficient than having another Module create an instance of the class. This also makes it easier to implement performance improvements and create a generic API. If someone implements a function that adheres to the contract of your functional, it can be an immediate drop-in. This is important if we want to support different backends (e.g. move a functional entirely into C++).

python
>>> torchaudio.transforms.Spectrogram(n_fft=...)(waveform)
>>> torchaudio.functional.spectrogram(waveform, …)

Loading and saving to file
Tensors can be read and written to various file formats (e.g. “mp3”, “wav”, etc.) through torchaudio.
python
sound, sample_rate = torchaudio.load(‘input.wav’)
torchaudio.save(‘output.wav’, sound)

Transforms and functionals
Transforms
python
class Compose(object):
def __init__(self, transforms):
def __call__(self, audio):

class Scale(object):
def __init__(self, factor=2**31):
def __call__(self, tensor):

class PadTrim(object):
def __init__(self, max_len, fill_value=0, channels_first=True):
def __call__(self, tensor):

class DownmixMono(object):
def __init__(self, channels_first=None):
def __call__(self, tensor):

class LC2CL(object):
def __call__(self, tensor):

def SPECTROGRAM(*args, **kwargs):

class Spectrogram(object):
def __init__(self, n_fft=400, ws=None, hop=None,
pad=0, window=torch.hann_window,
power=2, normalize=False, wkwargs=None):
def __call__(self, sig):

def F2M(*args, **kwargs):

class MelScale(object):
def __init__(self, n_mels=128, sr=16000, f_max=None, f_min=0., n_stft=None):
def __call__(self, spec_f):

class SpectrogramToDB(object):
def __init__(self, stype="power", top_db=None):
def __call__(self, spec):

class MFCC(object):
def __init__(self, sr=16000, n_mfcc=40, dct_type=2, norm='ortho', log_mels=False,
melkwargs=None):
def __call__(self, sig):

class MelSpectrogram(object):
def __init__(self, sr=16000, n_fft=400, ws=None, hop=None, f_min=0., f_max=None,
pad=0, n_mels=128, window=torch.hann_window, wkwargs=None):
def __call__(self, sig):

def MEL(*args, **kwargs):

class BLC2CBL(object):
def __call__(self, tensor):

class MuLawEncoding(object):
def __init__(self, quantization_channels=256):
def __call__(self, x):

class MuLawExpanding(object):
def __init__(self, quantization_channels=256):
def __call__(self, x_mu):


Functional
python
def scale(tensor, factor):
type: (Tensor, int) -> Tensor

def pad_trim(tensor, ch_dim, max_len, len_dim, fill_value):
type: (Tensor, int, int, int, float) -> Tensor

def downmix_mono(tensor, ch_dim):
type: (Tensor, int) -> Tensor

def LC2CL(tensor):
type: (Tensor) -> Tensor

def spectrogram(sig, pad, window, n_fft, hop, ws, power, normalize):
type: (Tensor, int, Tensor, int, int, int, int, bool) -> Tensor

def create_fb_matrix(n_stft, f_min, f_max, n_mels):
type: (int, float, float, int) -> Tensor

def mel_scale(spec_f, f_min, f_max, n_mels, fb=None):
type: (Tensor, float, float, int, Optional[Tensor]) -> Tuple[Tensor, Tensor]

def spectrogram_to_DB(spec, multiplier, amin, db_multiplier, top_db=None):
type: (Tensor, float, float, float, Optional[float]) -> Tensor

def create_dct(n_mfcc, n_mels, norm):
type: (int, int, string) -> Tensor

def MFCC(sig, mel_spect, log_mels, s2db, dct_mat):
type: (Tensor, MelSpectrogram, bool, SpectrogramToDB, Tensor) -> Tensor

def BLC2CBL(tensor):
type: (Tensor) -> Tensor

def mu_law_encoding(x, qc):
type: (Tensor, int) -> Tensor

def mu_law_expanding(x_mu, qc):
type: (Tensor, int) -> Tensor

Datasets VCTK and YESNO
All datasets are subclasses of [torch.utils.data.Dataset](https://pytorch.org/docs/stable/data.html#torch.utils.data.Dataset) i.e, they have `__getitem__` and `__len__` methods implemented. Hence, they can all be passed to a [torch.utils.data.DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader) which can load multiple samples parallelly using torch.multiprocessing workers. For example:
python
yesno_data = torchaudio.datasets.YESNO('.', download=True)
data_loader = torch.utils.data.DataLoader(yesno_data,
batch_size=1,
shuffle=True,
num_workers=args.nThreads)


The two datasets available are [VCTK](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/datasets/vctk.py) and [YESNO](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/datasets/yesno.py). They download the datasets and preprocess them so that the loaded data is in convenient format.
SoxEffects and SoxEffectsChain
SoxEffects and SoxEffectsChain in [torchaudio.sox_effects](https://github.com/pytorch/audio/blob/v0.2.0/torchaudio/sox_effects.py) expose sox operations through a Python interface. Various useful effects like downmixing a multichannel signal or resampling a signal can be done here.
python
torchaudio.initialize_sox()
E = torchaudio.sox_effects.SoxEffectsChain()
E.append_effect_to_chain("rate", [16000]) resample to 16000hz
E.append_effect_to_chain("channels", ["1"]) mono signal
E.set_input_file(fn)
waveform, sample_rate = E.sox_build_flow_effects()
torchaudio.shutdown_sox()

Page 16 of 16

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.