- Support all textual smileys and textfaces from Signal messenger. - Raise a TypeError if tokenize_text is called with a string instead of an iterable of strings (issue 13)
2.0.5
- Add heuristics for ambiguous quotation marks (issue 11). - Avoid false positives for emoticons that contain a space (issue 12). - Correctly tokenize obfuscated email addresses that contain spaces. - Do not split tl;dr and its German variant zl;ng.
2.0.4
- Bugfix: Prevent race conditions between tokenizer and sentence splitter in parallel processing (--parallel > 1).
2.0.3
- Skip tests for unimplemented features (some builds will fail if any of the unit tests fail).
2.0.2
- Bugfix: Parallel tokenization (--parallel > 1) works again. - Support for musical notes (sharps).