Major changes
N/A
New features
- [ALL] Added SentencePieceNormalizer class in C++/Python. It supports almost the equivalent feature of spm_normalize. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L794) [C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L394)
- [ALL] Added SentencePieceProcessor::Normalize method in C++/Python [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L771)
[C++ Sample](https://github.com/google/sentencepiece/blob/master/src/sentencepiece_trainer_test.cc#L382)
- [ALL] Added functionality to override the normalization spec before the processing. [Python Sample](https://github.com/google/sentencepiece/blob/master/python/test/sentencepiece_test.py#L860)
Bug fixes & minor changes
- Introduce better support of using external abseil and protobuf https://github.com/google/sentencepiece/issues/869
- Build universal binary in OSX release package https://github.com/google/sentencepiece/issues/892
- Add the set_min_log_level function to python to change the loglevel from the python wrapper. https://github.com/google/sentencepiece/issues/893
- Uses the logsumexp techniques in marginal probabilities of n-best tokenization to avoid underflow.
- Support Python 3.12 https://github.com/google/sentencepiece/issues/932
- Improves the thread utilization in batch encoding/decoding.
- Fix nasty bug in BPE position encoding.
- Fix bugs in the handling of duplicated bigrams