Somajo

Latest version: v2.4.3

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Page 9 of 9

1.3.0

Matching of items containing “+” or “&” or being written in camel case
has been optimized a bit. Now the tokenizer runs roughly three to four
times faster.

1.2.0

Two new options added: With -s/--paragraph_separator, you can specify
how paragraphs are delimited in the input data, i.e. by empty lines or
by single newlines. The --parallelization option makes it possible to
use a pool of worker processes to speed up tokenization.

1.1.2

The example in the documentation is now self-contained: Sample input
has been added and the output will be printed.

1.1.1

The link in the Evaluation section of the Readme now points to the
complete gold standard data.

1.1.0

SoMaJo can now output additional information about the original
spelling of the tokens, i.e. if a token was followed by whitespace or
if a token contained internal whitespace (according to the
tokenization guidelines, things like “: )” get normalized to “:)”). To
use this feature, provide the tokenizer script with the -e option.

1.0.3

This version works around a bug in the regex module that caused
exponential runtimes on certain inputs.

Page 9 of 9

Releases

Has known vulnerabilities

Somajo

Page 9 of 9

1.3.0

1.2.0

1.1.2

1.1.1

1.1.0

1.0.3

Page 9 of 9

Links

Releases