What's Changed
This release makes breaking changes to the jiwer API.
First, we introduce 3 new methods:
1.`jiwer.compute_measures()` is renamed to `jiwer.process_words`, and returns everything in a `dataclass` named `WordOutput`.
2.`jiwer.cer(return_dict=True)` is deprecated, and is superseded by `jiwer.process_characters`, which returns everything in a `dataclass` named `CharacterOutput`
3. `jiwer.visualize_measures` is renamed to `jiwer.visualize_alignment`. Moreover, the keyword argument `visualize_cer: bool = False` has been removed, and the `output` keyword argument is now of expected type `Union[WordOutput, CharacterOutput]`.
I've also decided to rename all mentions of the concept "(ground)truth" to "reference", in the light of the Whisper speech-to-text model showing that future ASR models might not trained on something like a "ground truth". Therefore, in the following methods, the keyword arguments `truth` and `truth_transform` have been renamed to `reference` and `reference_transform`:
1. `jiwer.cer()`
2. `jiwer.mer()`
3. `jiwer.wer()`
4. `jiwer.wil()`
5. `jiwer.wip()`
The alignments are now stored as a list of lists containing `jiwer.AlignmentChunk` dataclass objects instead of hard-to-document tuples.
Lastly, I've added `jiwer.transformations.cer_contiguous` for optionally calculating the `CER` with uneven amount of reference and hypothesis sentences. I've also changed the `wer_standardize` and `wer_standardize_contiguous` so that the last 3 transformations are now:
tr.Strip(),
tr.ReduceToSingleSentence(),
tr.ReduceToListOfListOfWords(),
This releases also introduced a documentation website. See https://jitsi.github.io/jiwer.
**Full Changelog**: https://github.com/jitsi/jiwer/compare/v2.6.0...v3.0.0