Segments

Latest version: v2.3.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

2.1.0

- Dropped py2 support
- Added compat for clldutils 3.x

2.0.1

Bug fixes

- Fixed a bug where NULL values in orthography profiles could not be read when
the profile was initialized with Unicode normalization.

2.0.0

Added

`segments` now supports orthography profiles described by CSVW metadata.


Backward Incompatibilities

Orthography profiles and the input of `Tokenizer.__call__` is no longer Unicode normalized
by default. I.e. the user is responsible for making sure profiles and tokenization
input are normalized correspondingly. Alternatively, profile data can be normalized
by passing a `form` keyword argument when initializing a `Profile` instance. But
also in this case, tokenization input must be normalized by the user.

While this results in a more cumbersome API, it gives the user in full control, e.g.
to avoid incorrect segmentation when parts of decomposed graphemes are appended to
preseding grapheme clusters.

Page 2 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.