Someweta

Latest version: v1.8.1

Safety actively analyzes 681775 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

1.8.1

- Prefer the 'fork' method for creating the worker processes for
parallel tagging, if it is supported by the operating system. This
is much faster than the 'spawn' method that is the default on some
non-Linux systems (issue 14).

1.8.0

- Add option --use-nfkc to the command line interface and option
use_nfkc to the constructor of ASPTagger (issue 11). If this option
is used, the internal representation of the input data uses Unicode
normalization form NFKC. This can be useful for social media input
that misuses mathematical symbols for their typographic effects
(e.g. β€œπ•΄π–’π–•π–‹π–†π–šπ–˜π–œπ–Šπ–Žπ–˜β€ instead of β€œImpfausweis”).
- Add option --sentence-tag to specify an XML tag in the input data
that marks sentence boundaries (issue 12). This is particularly
useful in combination with the --sentence-tag option of SoMaJo.

1.7.3

- Use less memory when loading a model if the ijson library is present
and the Python version is at least 3.7 (at least 3.6 for CPython)
(issue 9).
- Restructured code for parallel tagging (issue 8).

1.7.2

- Bugfix: Do not choke on chunks of XML that do not contain actual
word tokens (usually at the end of a file).
- Updated regular expressions for emojis, emoticons, numbers and URLs.

1.7.1

- Fixed an XML-related bug in STTS_IBK_postprocessor
- Fixed a minor bug in emoticon regex

1.7.0

- Added Reddit links and Reddit-specific emoticons
- Moved command-line interface to cli.py
- Helper script for tagging multiple files (somewe-tagger-multifile)
- Postprocessing script for some deterministic tagging decisions in
STTS_IBK, e.g. URLs, Emoticons, etc. (STTS_IBK_postprocessor)

Page 1 of 4

Β© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.