Sacrebleu

Latest version: v2.5.1

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 10

2.0.0

- Build: Add Windows and OS X testing to Travis CI.
- Improve documentation and type annotations.
- Drop `Python < 3.6` support and migrate to f-strings.
- Relax `portalocker` version pinning, add `regex, tabulate, numpy` dependencies.
- Drop input type manipulation through `isinstance` checks. If the user does not obey
to the expected annotations, exceptions will be raised. Robustness attempts lead to
confusions and obfuscated score errors in the past (121)
- Variable references per segment is supported for all metrics by default. It is
still only available through the API.
- Use colored strings in tabular outputs (multi-system evaluation mode) through
the help of `colorama` package.
- tokenizers: Add caching to tokenizers which seem to speed up things a bit.
- `intl` tokenizer: Use `regex` module. Speed goes from ~4 seconds to ~0.6 seconds
for a particular test set evaluation. (46)
- Signature: Formatting changed (mostly to remove '+' separator as it was
interfering with chrF++). The field separator is now '|' and key values
are separated with ':' rather than '.'.
- Signature: Boolean true / false values are shortened to yes / no.
- Signature: Number of references is `var` if variable number of references is used.
- Signature: Add effective order (yes/no) to BLEU and chrF signatures.
- Metrics: Scale all metrics into the [0, 100] range (140)
- Metrics API: Use explicit argument names and defaults for the metrics instead of
passing obscure `argparse.Namespace` objects.
- Metrics API: A base abstract `Metric` class is introduced to guide further
metric development. This class defines the methods that should be implemented
in the derived classes and offers boilerplate methods for the common functionality.
A new metric implemented this way will automatically support significance testing.
- Metrics API: All metrics now receive an optional `references` argument at
initialization time to process and cache the references. Further evaluations
of different systems against the same references becomes faster this way
for example when using significance testing.
- BLEU: In case of no n-gram matches at all, skip smoothing and return 0.0 BLEU (141).
- CHRF: Added multi-reference support, verified the scores against chrF++.py, added test case.
- CHRF: Added chrF+ support through `word_order` argument. Added test cases against chrF++.py.
Exposed it through the CLI (--chrf-word-order) (124)
- CHRF: Add possibility to disable effective order smoothing (pass --chrf-eps-smoothing).
This way, the scores obtained are exactly the same as chrF++, Moses and NLTK implementations.
We keep the effective ordering as the default for compatibility, since this only
affects sentence-level scoring with very short sentences. (144)
- CLI: `--input/-i` can now ingest multiple systems. For this reason, the positional
`references` should always preceed the `-i` flag.
- CLI: Allow modifying TER arguments through CLI. We still keep the TERCOM defaults.
- CLI: Prefix metric-specific arguments with --chrf and --ter. To maintain compatibility,
BLEU argument names are kept the same.
- CLI: Separate metric-specific arguments for clarity when `--help` is printed.
- CLI: Added `--format/-f` flag. The single-system output mode is now `json` by default.
If you want to keep the old text format persistently, you can export `SACREBLEU_FORMAT=text` into your
shell.
- CLI: For multi-system mode, `json` falls back to plain text. `latex` output can only
be generated for multi-system mode.
- CLI: sacreBLEU now supports evaluating multiple systems for a given test set
in an efficient way. Through the use of `tabulate` package, the results are
nicely rendered into a plain text table, LaTeX, HTML or RST (cf. --format/-f argument).
The systems can be either given as a list of plain text files to `-i/--input` or
as a tab-separated single stream redirected into `STDIN`. In the former case,
the basenames of the files will be automatically used as system names.
- Statistical tests: sacreBLEU now supports confidence interval estimation
through bootstrap resampling for single-system evaluation (`--confidence` flag)
as well as paired bootstrap resampling (`--paired-bs`) and paired approximate
randomization tests (`--paired-ar`) when evaluating multiple systems (40 and 78).

1.5.1

- Fix extraction error for WMT18 extra test sets (test-ts) (142)
- Validation and test datasets are added for multilingual TEDx

1.5.0

- Fix an assertion error in chrF (121)
- Add missing `__repr__()` methods for BLEU and TER
- TER: Fix exception when `--short` is used (131)
- Pin Mecab version to 1.0.3 for Python 3.5 support
- [API Change]: Default value for `floor` smoothing is now 0.1 instead of 0.
- [API Change]: `sacrebleu.sentence_bleu()` now uses the `exp` smoothing method,
exactly the same as the CLI's --sentence-level behavior. This was mainly done
to make two methods behave the same.
- Add smoothing value to BLEU signature (98)
- dataset: Fix IWSLT links (128)
- Allow variable number of references for BLEU (only via API) (130).
Thanks to Ondrej Dusek (tuetschek)

1.4.14

- Added character-based tokenization (`-tok char`).
Thanks to Christian Federmann.
- Added TER (`-m ter`). Thanks to Ales Tamchyna! (fixes 90)
- Allow calling the script as a standalone utility (fixes 86)
- Fix type annotation issues (fixes 100) and mark sacrebleu as supporting mypy
- Added WMT20 robustness test sets:
- wmt20/robust/set1 (en-ja, en-de)
- wmt20/robust/set2 (en-ja, ja-en)
- wmt20/robust/set3 (de-en)

1.4.13

- Added WMT20 newstest test sets (103)
- Make mecab3-python an extra dependency, adapt code to new mecab3-python
This fixes the recent Windows installation issues as well (104)
Japanese support should now be explicitly installed through sacrebleu[ja] package.
- Fix return type annotation of corpus_bleu()
- Improve sentence_score's documentation, do not allow single ref string (98)

1.4.12

- Fix a deployment bug (96)

Page 3 of 10

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.