Py-stringmatching

Latest version: v0.4.6

Safety actively analyzes 685838 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

0.4.0

* Rewritten five similarity measures in Cython: Affine, Jaro, Jaro Winkler, Needleman Wunsch, and Smith Waterman.
* Added benchmark scripts to measure the performance of similarity measures.

0.3.0

* Added nine new string similarity measures - Bag Distance, Editex,
Generalized Jaccard, Partial Ratio, Partial Token Sort, Ratio,
Soundex, Token Sort, and Tversky Index.

0.2.1

* Remove explicit installation of numpy using pip in setup. Add numpy in setup_requires and compile extensions by including numpy install path.

0.2.0

* Qgram tokenizers have been modified to take a flag called "padding". If this flag is True (the default), then a prefix and a suffix will be added to the input string before tokenizing (see the Tutorial for a reason for this).
* Version 0.1.0 does not handle strings in unicode correctly. Specifically, if an input string contains non-ascii characters, a string similarity measure may interpret the string incorrectly and thus compute an incorrect similarity score. In this version we have fixed the string similarity measures. Specifically, we convert the input strings into unicode before computing similarity measures. NOTE: the tokenizers are still not yet unicode-aware.
* In Version 0.1.0, the flag "dampen" for TF/IDF similarity measure has the default value of False. In this version we have modified it to have the default value of True, which is the more common value for this flag in practice.

0.1.0

* Initial release.
* Contains 5 tokenizers - Alphabetic tokenizer, Alphanumeric tokenizer, Delimiter tokenizer, Qgram tokenizer and
Whitespace tokenizer.
* Contains 14 similarity measures - Affine, Cosine, Dice, Hamming distance, Jaccard, Jaro, Jaro-Winkler,
Levenshtein, Monge-Elkan, Needleman-Wunsch, Overlap coefficient, Smith-Waterman, Soft TF-IDF, and TF-IDF.

Page 2 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.