Dedupe

Latest version: v3.0.3

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

1.1.0

Features
- Handle FuzzyCategoricalType in datamodel

1.0.0

Features
- Speed up learning
- Parallelize sampling
- Optional [CRF Edit Distance](https://dedupe.readthedocs.io/en/latest/Variable-definition.html#optional-edit-distance)

0.8.0

Support for Python 3.4 added. Support for Python 2.6 dropped.

Features
- Windows OS supported
- train method has argument for not considering index predicates
- TfIDFNGram Index Predicate added (for shorter string)
- SuffixArray Predicate
- Double Metaphone Predicates
- Predicates for numbers, OrderOfMagnitude, Round
- Set Predicate OrderOfCardinality
- Final, learned predicates list will now often be smaller without
loss of coverage
- Variables refactored to support external extensions like
https://github.com/datamade/dedupe-variable-address
- Categorical distance, regularized logistic regression, affine gap
distance, canonicalization have been turned into separate libraries.
- Simplejson is now dependency

0.7.5

Features
- Individual record cluster membership scores
- New predicates
- New Exists Variable Type

Bug Fixes
- Latlong predicate fixed
- Set TFIDF canopy working properly

0.7.4

Features
- Sampling methods now use blocked sampling

0.7.0

Features
- new index, unindex, and match methods in Gazetter Matching. Useful for
streaming matching

Page 6 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.