Pyannote.audio

Latest version: v3.2.0

Safety actively analyzes 628372 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

3.2.0

New features

- feat(task): add option to cache task training metadata to speed up training (with [clement-pages](https://github.com/clement-pages/))
- feat(model): add `receptive_field`, `num_frames` and `dimension` to models (with [Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(model): add `fbank_only` property to `WeSpeaker` models
- feat(util): add `Powerset.permutation_mapping` to help with permutation in powerset space (with [FrenchKrab](https://github.com/FrenchKrab))
- feat(sample): add sample file at `pyannote.audio.sample.SAMPLE_FILE`
- feat(metric): add `reduce` option to `diarization_error_rate` metric (with [Bilal-Rahou](https://github.com/Bilal-Rahou))
- feat(pipeline): add `Waveform` and `SampleRate` preprocessors

Fixes

- fix(task): fix random generators and their reproducibility (with [FrenchKrab](https://github.com/FrenchKrab))
- fix(task): fix estimation of training set size (with [FrenchKrab](https://github.com/FrenchKrab))
- fix(hook): fix `torch.Tensor` support in `ArtifactHook`
- fix(doc): fix typo in `Powerset` docstring (with [lukasstorck](https://github.com/lukasstorck))

Improvements

- improve(metric): add support for number of speakers mismatch in `diarization_error_rate` metric
- improve(pipeline): track both `Model` and `nn.Module` attributes in `Pipeline.to(device)`
- improve(io): switch to `torchaudio >= 2.2.0`
- improve(doc): update tutorials (with [clement-pages](https://github.com/clement-pages/))

Breaking changes

- BREAKING(model): get rid of `Model.example_output` in favor of `num_frames` method, `receptive_field` property, and `dimension` property
- BREAKING(task): custom tasks need to be updated (see "Add your own task" tutorial)

Community contributions

- community: add tutorial for offline use of `pyannote/speaker-diarization-3.1` (by [simonottenhauskenbun](https://github.com/simonottenhauskenbun))

3.1.1

TL;DR

Providing `num_speakers` to [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) now [works as expected](https://github.com/pyannote/pyannote-audio/issues/1567).

Fixes

- fix(pipeline): fix support for setting `num_speakers` in [`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) pipeline

3.1.0

TL;DR

[`pyannote/speaker-diarization-3.1`](https://hf.co/pyannote/speaker-diarization-3.1) no longer requires [unpopular](https://github.com/pyannote/pyannote-audio/issues/1537) ONNX runtime

New features

- feat(model): add WeSpeaker embedding wrapper based on PyTorch
- feat(model): add support for multi-speaker statistics pooling
- feat(pipeline): add `TimingHook` for profiling processing time
- feat(pipeline): add `ArtifactHook` for saving internal steps
- feat(pipeline): add support for list of hooks with `Hooks`
- feat(utils): add `"soft"` option to `Powerset.to_multilabel`

Fixes

- fix(pipeline): add missing "embedding" hook call in `SpeakerDiarization`
- fix(pipeline): fix `AgglomerativeClustering` to honor `num_clusters` when provided
- fix(pipeline): fix frame-wise speaker count exceeding `max_speakers` or detected `num_speakers` in `SpeakerDiarization` pipeline

Improvements

- improve(pipeline): compute `fbank` on GPU when requested

Breaking changes

- BREAKING(pipeline): rename `WeSpeakerPretrainedSpeakerEmbedding` to `ONNXWeSpeakerPretrainedSpeakerEmbedding`
- BREAKING(setup): remove `onnxruntime` dependency.
You can still use ONNX `hbredin/wespeaker-voxceleb-resnet34-LM` but you will have to install `onnxruntime` yourself.
- BREAKING(pipeline): remove `logging_hook` (use `ArtifactHook` instead)
- BREAKING(pipeline): remove `onset` and `offset` parameter in `SpeakerDiarizationMixin.speaker_count`
You should now binarize segmentations before passing them to `speaker_count`

3.0.1

- fix(pipeline): fix WeSpeaker GPU support

3.0.0

Features and improvements

- feat(pipeline): send pipeline to device with `pipeline.to(device)`
- feat(pipeline): add `return_embeddings` option to `SpeakerDiarization` pipeline
- feat(pipeline): make `segmentation_batch_size` and `embedding_batch_size` mutable in `SpeakerDiarization` pipeline (they now default to `1`)
- feat(pipeline): add progress hook to pipelines
- feat(task): add [powerset](https://www.isca-speech.org/archive/interspeech_2023/plaquet23_interspeech.html) support to `SpeakerDiarization` task
- feat(task): add support for multi-task models
- feat(task): add support for label scope in speaker diarization task
- feat(task): add support for missing classes in multi-label segmentation task
- feat(model): add segmentation model based on torchaudio self-supervised representation
- feat(pipeline): check version compatibility at load time
- improve(task): load metadata as tensors rather than pyannote.core instances
- improve(task): improve error message on missing specifications

Breaking changes

- BREAKING(task): rename `Segmentation` task to `SpeakerDiarization`
- BREAKING(pipeline): pipeline defaults to CPU (use `pipeline.to(device)`)
- BREAKING(pipeline): remove `SpeakerSegmentation` pipeline (use `SpeakerDiarization` pipeline)
- BREAKING(pipeline): remove `segmentation_duration` parameter from `SpeakerDiarization` pipeline (defaults to `duration` of segmentation model)
- BREAKING(task): remove support for variable chunk duration for segmentation tasks
- BREAKING(pipeline): remove support for `FINCHClustering` and `HiddenMarkovModelClustering`
- BREAKING(setup): drop support for Python 3.7
- BREAKING(io): channels are now 0-indexed (used to be 1-indexed)
- BREAKING(io): multi-channel audio is no longer downmixed to mono by default.
You should update how `pyannote.audio.core.io.Audio` is instantiated:
- replace `Audio()` by `Audio(mono="downmix")`;
- replace `Audio(mono=True)` by `Audio(mono="downmix")`;
- replace `Audio(mono=False)` by `Audio()`.
- BREAKING(model): get rid of (flaky) `Model.introspection`
If, for some weird reason, you wrote some custom code based on that,
you should instead rely on `Model.example_output`.
- BREAKING(interactive): remove support for Prodigy recipes

Fixes and improvements

- fix(pipeline): fix reproducibility issue with Ampere CUDA devices
- fix(pipeline): fix support for IOBase audio
- fix(pipeline): fix corner case with no speaker
- fix(train): prevent metadata preparation to happen twice
- fix(task): fix support for "balance" option
- improve(task): shorten and improve structure of Tensorboard tags

Dependencies update

- setup: switch to torch 2.0+, torchaudio 2.0+, soundfile 0.12+, lightning 2.0+, torchmetrics 0.11+
- setup: switch to pyannote.core 5.0+, pyannote.database 5.0+, and pyannote.pipeline 3.0+
- setup: switch to speechbrain 0.5.14+

2.1.1

- BREAKING(pipeline): rewrite speaker diarization pipeline
- feat(pipeline): add option to optimize for DER variant
- feat(clustering): add support for NeMo speaker embedding
- feat(clustering): add FINCH clustering
- feat(clustering): add min_cluster_size hparams to AgglomerativeClustering
- feat(hub): add support for private/gated models
- setup(hub): switch to latest hugginface_hub API
- fix(pipeline): fix support for missing reference in Resegmentation pipeline
- fix(clustering) fix corner case where HMM.fit finds too little states

Page 1 of 3

Releases

Has known vulnerabilities

Pyannote.audio

Page 1 of 3

3.2.0

3.1.1

3.1.0

3.0.1

3.0.0

2.1.1

Page 1 of 3

Links

Releases