Tokenizer

Latest version: v3.4.5

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

3.1.0

* Added `-o` switch to `tokenize` command to return original token text, enabling the tokenizer to run as a sentence splitter only.

3.0.0

* Added tracking of character offsets for tokens within the original source text.
* Added full type annotations.
* Dropped Python 2.7 support. Tokenizer now supports Python >= 3.6.

2.5.0

* Added command-line arguments to the tokenizer executable, corresponding to available tokenization options
* Updated and enhanced type annotations
* Minor documentation edits

2.4.0

* Fixed bug where certain well-known word forms (*fá*, *fær*, *mín*, *sá*...) were being interpreted as (wrong) abbreviations.
* Also fixed bug where certain abbreviations were being recognized even in uppercase and at the end of a sentence, for instance *Örn*.

2.3.1

Various bug fixes; fixed type annotations for Python 2.7; the token kind ``NUMBER WITH LETTER`` is now ``NUMWLETTER``.

2.3.0

Added the ``replace_html_escapes`` option to the ``tokenize()`` function.

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.