Sacrebleu

Latest version: v2.5.1

Safety actively analyzes 723973 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 10

2.3.2

Fixed:
- Special treatment of empty references in TER (232)
- Bump in mecab version for JA (234)

Added:
- Warning if `-tok spm` is used (use explicit `flores101` instead) (238)

2.3.1

Bugfix:
- Set lru_cache to 2^16 for SPM tokenizer (was set to infinite)

2.3.0

Features:
- (203) Added `-tok flores101` and `-tok flores200`, a.k.a. `spbleu`.
These are multilingual tokenizations that make use of the
multilingual SPM models released by Facebook and described in the
following papers:
* Flores-101: https://arxiv.org/abs/2106.03193
* Flores-200: https://arxiv.org/abs/2207.04672
- (213) Added JSON formatting for multi-system output (thanks to Manikanta Inugurthi me-manikanta)
- (211) You can now list all test sets for a language pair with `--list SRC-TRG`.
Thanks to Jaume Zaragoza (ZJaume) for adding this feature.
- Added WMT22 test sets (test set `wmt22`)
- System outputs: include with wmt22. Also added wmt21/systems which will produce WMT21 submitted systems.
To see available systems, give a dummy system to `--echo`, e.g., `sacrebleu -t wmt22 -l en-de --echo ?`

2.2.1

Bugfix: Standard usage was returning (and using) each reference twice.

2.2.0

Features:
- Added WMT21 datasets (thanks to BrighXiaoHan)
- `--echo` now exposes document metadata where available (e.g., docid, genre, origlang)
- Bugfix: allow empty references (161)
- Adds a Korean tokenizer (thanks to NoUnique)

Under the hood:
- Moderate code refactoring
- Processed files have adopted a more sensible internal naming scheme under ~/.sacrebleu
(e.g., wmt17_ms.zh-en.src instead of zh-en.zh)
- Processed file extensions correspond to the values passed to `--echo` (e.g., "src")
- Now explicitly representing NoneTokenizer
- Got rid of the ".lock" lockfile for downloading (using the tarball itself)

Many thanks to BrightXiaoHan (https://github.com/BrightXiaoHan) for the bulk of
the code contributions in this release.

2.1.0

Features:
- Added `-tok spm` for multilingual SPM tokenization (168)
(thanks to Naman Goyal and James Cross at Facebook)

Fixes:
- Handle potential memory usage issues due to LRU caching in tokenizers (167)
- Bugfix: BLEU.corpus_score() now using max_ngram_order (173)
- Upgraded ja-mecab to 1.0.5 (196)

Page 2 of 10

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.