You can now specify the `top_n` matches for each string. This option allows you to get a selection of matches that best suit the input. It is implemented in `polyfuzz.models.TFIDF` and `polyfuzz.models.Embeddings` since this is computationally quite heavy and these models are best suited for making those calculations.
Usage:
python
from polyfuzz import PolyFuzz
from_list = ["apple", "apples", "appl", "recal", "house", "similarity"]
to_list = ["apple", "apples", "mouse"]
model = PolyFuzz("TF-IDF")
model.match(from_list, to_list, top_n=3)
Or usage in custom models:
python
from polyfuzz.models import TFIDF, Embeddings
from flair.embeddings import TransformerWordEmbeddings
embeddings = TransformerWordEmbeddings('bert-base-multilingual-cased')
bert = Embeddings(embeddings, min_similarity=0, model_id="BERT", top_n=3)
tfidf = TFIDF(min_similarity=0, top_n=3)
string_models = [bert, tfidf]
model = PolyFuzz(string_models)
model.match(from_list, to_list)