Tokenizer

Latest version: v3.4.5

Safety actively analyzes 710445 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 8

1.1.1

Added the `mark_paragraphs()` function

1.1.0

All abbreviations in `Abbrev.conf` are now returned with their meaning in a tuple in `token.val`; handling of 'mbl.is' fixed

1.0.9

Added _MAST_ abbreviation; harmonized copyright headers

1.0.7

Added `NUMWLETTER` token type, for numbers with a single-letter suffix (`12a`, `80D`). This will mainly be useful for parsing addresses. Note that if a conflict occurs between `NUMWLETTER` and `MEASUREMENT` (such as `16A`, meaning 16 ampere), the latter takes precedence.

1.0.6

The tokenizer now automatically combines Unicode `COMBINING ACUTE ACCENT` and `COMBINING DIAERESIS` glyphs with vowels to form single code points for the Icelandic letters á, é, í, ó, ú, ý and ö (in both lower and upper case).

1.0.5

Date/time and amount tokens coalesced to a further extent

Page 7 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.