Tokenizer

Latest version: v3.4.5

Safety actively analyzes 714792 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 8

1.3.0

* Added ``TOK.DOMAIN`` and ``TOK.HASHTAG`` token types
* Improved handling of capitalized month name *Ágúst*, which is now recognized as such when it follows an ordinal number
* Improved recognition of telephone numbers
* Added abbreviations

1.2.3

Added abbreviations; updated GitHub URLs to point to mideind instead of vthorsteinsson

1.2.2

Added support for composites with more than two parts, i.e. *„dómsmála-, ferðamála-, iðnaðar- og nýsköpunarráðherra“*; added support for ``±`` sign; added several abbreviations

1.2.1

Fixed bug where the name 'Ágúst' was recognized as a month name. Unicode nonbreaking and invisible space characters are now removed before tokenization.

1.2.0

Added support for Unicode fraction characters; enhanced handing of degrees (°, °C, °F); fixed bug in cubic meter measurement unit; more abbreviations

1.1.2

Fixed bug in liter measurement unit (`l` and `ltr`); was 1000 times too large

Page 6 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.