Somajo

Latest version: v2.4.3

Safety actively analyzes 682487 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 9

2.4.3

- Move non-abbreviation tokens that should not be split from
`single_token_abbreviations_<LANG>.txt` to
`single_tokens_<LANG>.txt` and add cellular networks generations
(issue 32).

2.4.2

- Fix issues 28 and 29 (markdown links with trailing symbols after
URL part).

2.4.1

- Fix issue 27 (URLs in angle brackets).

2.4.0

- New feature: SoMaJo can output character offsets for tokens,
allowing for stand-off tokenization. Pass `character_offsets=True`
to the constructor or use the option `--character-offsets` on the
command line to enable the feature. The character offsets are
determined by aligning the tokenized output with the input,
therefore activating the feature incurs a noticeable increase in
processing time.

2.3.1

- Fix issue 26 (markdown links that contain a URL in the link text).

2.3.0

- **Potentially breaking change:** The somajo-tokenizer script is
automatically created upon installation and bin/somajo-tokenizer is
removed. For most users, this does not make a difference. If you
used to run your own modified version of SoMaJo directly via
bin/somajo-tokenizer, consider installing the project in editable
mode (see Development section in README.md).
- Switch from setup.py to pyconfig.toml and restructure the project
(source in src, tests in tests).
- When creating a Token object, only known token classes can be
passed.
- Fix issue 25 (dates at the end of sentences)

Page 1 of 9

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.