Wordfreq

Latest version: v3.1.1

Safety actively analyzes 681775 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

3.1.1

- Supports Python 3.8 through 3.12.
- Stopped using the deprecated `pkg_resources` and implicitly depending on
setuptools. We now use the `locate` package instead.

3.0.3

- wordfreq has changed from the MIT license to the Apache License 2.0.

The Apache license is equally permissive, but clarifies requirements about
attributing the author (Robyn Speer), making this attribution readable, and
maintaining credit for the included data resources.

- Minor packaging changes to make sure that tox can test the library on Python
3.7 through 3.11.

3.0.2

- Updated the range of allowable versions of `regex`. Versions before 2021.7.6
don't have the `regex.Match` class.

- Added the `extras` dependencies as optional dependencies in pyproject.toml.

3.0.1

mypy was accidentally listed as a full dependency; moved it to dev
dependencies.

3.0

This is the "handle numbers better" release.

Previously, wordfreq would group all digit sequences of the same 'shape',
with length 2 or more, into a single token and return the frequency of that
token, which would be a vast overestimate.

Now it distributes the frequency over all numbers of that shape, with an
estimated distribution that allows for Benford's law (lower numbers are more
frequent) and a special frequency distribution for 4-digit numbers that look
like years (2010 is more frequent than 1020).

More changes related to digits:

- Functions such as `iter_wordlist` and `top_n_list` no longer return
multi-digit numbers (they used to return them in their "smashed" form, such
as "0000").

- `lossy_tokenize` no longer replaces digit sequences with 0s. That happens
instead in a place that's internal to the `word_frequency` function, so we can
look at the values of the digits before they're replaced.

Other changes:

- wordfreq is now developed using `poetry` as its package manager, and with
`pyproject.toml` as the source of configuration instead of `setup.py`.

- The minimum version of Python supported is 3.7.

- Type information is exported using `py.typed`.

2.5.1

- Import ftfy and use its `uncurl_quotes` method to turn curly quotes into
straight ones, providing consistency with multiple forms of apostrophes.

- Set minimum version requirements on `regex`, `jieba`, and `langcodes`
so that tokenization will give consistent results.

- Workaround an inconsistency in the `msgpack` API around
`strict_map_key=False`.

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.