Wordfreq

Latest version: v3.1.1

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

2.5

- Incorporate data from the OSCAR corpus.

2.4.2

- When tokenizing Japanese or Korean, MeCab's dictionaries no longer have to
be installed separately as system packages. They can now be found via the
Python packages `ipadic` and `mecab-ko-dic`.

- When the tokenizer had to infer word boundaries in languages without spaces,
inputs that were too long (such as the letter 'l' repeated 800 times) were
causing overflow errors. We changed the sequence of operations so that it
no longer overflows, and such inputs simply get a frequency of 0.

2.4.1

- Changed a log message to not try to call a language by name, to remove
the dependency on a database of language names.

2.4

- The Exquisite Corpus data has been updated to include Google Books Ngrams
2019, Reddit data through 2019, Wikipedia data from 2020, and Twitter-sampled
data from 2020, and somewhat more reliable language detection.

- Updated dependencies to require recent versions of `regex` and `jieba`,
to get tokenization that's consistent with the word lists. `regex` now
requires a version after 2020.04.04.

2.3.2

- Relaxing the dependency on regex had an unintended consequence in 2.3.1:
it could no longer get the frequency of French phrases such as "l'écran"
because their tokenization behavior changed.

2.3.2 fixes this with a more complex tokenization rule that should handle
apostrophes the same across these various versions of regex.

2.3.1

- State the dependency on msgpack >= 1.0 in setup.py.
- Relax the dependency on regex to allow versions after 2018.02.08.

Page 2 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.