Polyfuzz

Latest version: v0.4.2

Safety actively analyzes 634607 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

0.3

You can now specify the `top_n` matches for each string. This option allows you to get a selection of matches that best suit the input. It is implemented in `polyfuzz.models.TFIDF` and `polyfuzz.models.Embeddings` since this is computationally quite heavy and these models are best suited for making those calculations.

Usage:

python
from polyfuzz import PolyFuzz

from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]

model = PolyFuzz("TF-IDF")
model.match(from_list, to_list, top_n=3)


Or usage in custom models:
python
from polyfuzz.models import TFIDF, Embeddings
from flair.embeddings import TransformerWordEmbeddings

embeddings = TransformerWordEmbeddings('bert-base-multilingual-cased')
bert = Embeddings(embeddings, min_similarity=0, model_id="BERT", top_n=3)
tfidf = TFIDF(min_similarity=0, top_n=3)

string_models = [bert, tfidf]
model = PolyFuzz(string_models)
model.match(from_list, to_list)

0.2.2

This release is meant as a way to create a DOI through Zenodo.

0.2.1

First public release. Includes:

Features:
* Edit Distance
* TF-IDF
* Embeddings
* Custom models
* Grouping of results with custom models
* Evaluation through precision-recall curves

Fixes:
* Update naming convention matcher --> model
* Add basic models to grouper
* Fix issues with vector order in cosine similarity
* Update naming of cosine similarity function

Page 2 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.