* Added `-o` switch to `tokenize` command to return original token text, enabling the tokenizer to run as a sentence splitter only.
3.0.0
* Added tracking of character offsets for tokens within the original source text. * Added full type annotations. * Dropped Python 2.7 support. Tokenizer now supports Python >= 3.6.
2.5.0
* Added command-line arguments to the tokenizer executable, corresponding to available tokenization options * Updated and enhanced type annotations * Minor documentation edits
2.4.0
* Fixed bug where certain well-known word forms (*fá*, *fær*, *mín*, *sá*...) were being interpreted as (wrong) abbreviations. * Also fixed bug where certain abbreviations were being recognized even in uppercase and at the end of a sentence, for instance *Örn*.
2.3.1
Various bug fixes; fixed type annotations for Python 2.7; the token kind ``NUMBER WITH LETTER`` is now ``NUMWLETTER``.
2.3.0
Added the ``replace_html_escapes`` option to the ``tokenize()`` function.