- Fix `ModuleNotFoundError` and test optional dependencies (142) - Simplify code and add missing type annotations (144)
1.1.0
-----
- Add a memory-efficient dictionary factory backed by MARISA-tries by Dunedan in 133 - Drop support for Python 3.6 & 3.7 by Dunedan in 134 - Update setup files (138)
1.0.0
-----
Extensive refactoring by juanjoDiaz: - Series of modular classes - Different lemmatization strategies available - Customization of dictionary loading and handling (`DictionaryFactory`) - `LanguageDetector` class with extended options - See readme and [detailed documentation](https://adbar.github.io/simplemma/)
Breaking changes: - The `extensive` argument is now `greedy` - The `langdetect` submodule is now `language_detector` `from simplemma.langdetect import ...` → `from simplemma.language_detector import ...`
Fixes and improvements: - `is_known()` function now restored to its state in v0.9.0 (full dictionary) - More languages and better rules (with juanjoDiaz) - Use binary strings in dictionaries to save memory - Dictionary sort before compression by 1over137
Documentation: - Classes and general doc pages by juanjoDiaz - Section on classes in the readme by osma
0.9.1
-----
* smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (31) * unsupervised approach to affixes activated by default for some languages * reviewed rules for English and German (less greedy) * added rules for Dutch, Finnish, Polish and Russian * improved Russian and Ukrainian language data (3) * improved tokenizer
0.9.0
-----
* smaller data files (especially for fi, la, pl, pt, sk & tr, 19) * added support for Asturian (``ast``, 20) * bug fixes (18, 26)
0.8.2
-----
* languages added: Albanian, Hindi, Icelandic, Malay, Middle English, Northern Sámi, Nynorsk, Serbo-Croatian, Swahili, Tagalog * fix for slow language detection introduced in 0.7.0