Simplemma

Latest version: v1.1.1

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.1.1

-----

- Fix `ModuleNotFoundError` and test optional dependencies (142)
- Simplify code and add missing type annotations (144)

1.1.0

-----

- Add a memory-efficient dictionary factory backed by MARISA-tries by Dunedan in 133
- Drop support for Python 3.6 & 3.7 by Dunedan in 134
- Update setup files (138)

1.0.0

-----

Extensive refactoring by juanjoDiaz:
- Series of modular classes
- Different lemmatization strategies available
- Customization of dictionary loading and handling (`DictionaryFactory`)
- `LanguageDetector` class with extended options
- See readme and [detailed documentation](https://adbar.github.io/simplemma/)

Breaking changes:
- The `extensive` argument is now `greedy`
- The `langdetect` submodule is now `language_detector`
`from simplemma.langdetect import ...` → `from simplemma.language_detector import ...`

Fixes and improvements:
- `is_known()` function now restored to its state in v0.9.0 (full dictionary)
- More languages and better rules (with juanjoDiaz)
- Use binary strings in dictionaries to save memory
- Dictionary sort before compression by 1over137

Documentation:
- Classes and general doc pages by juanjoDiaz
- Section on classes in the readme by osma

0.9.1

-----

* smaller language data footprint with smallest possible impact on performance, using a combination of rules, upper limit on word length, and better data cleaning (31)
* unsupervised approach to affixes activated by default for some languages
* reviewed rules for English and German (less greedy)
* added rules for Dutch, Finnish, Polish and Russian
* improved Russian and Ukrainian language data (3)
* improved tokenizer

0.9.0

-----

* smaller data files (especially for fi, la, pl, pt, sk & tr, 19)
* added support for Asturian (``ast``, 20)
* bug fixes (18, 26)

0.8.2

-----

* languages added: Albanian, Hindi, Icelandic, Malay, Middle English, Northern Sámi, Nynorsk, Serbo-Croatian, Swahili, Tagalog
* fix for slow language detection introduced in 0.7.0

Page 1 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.