Datasketch

Latest version: v1.6.5

Safety actively analyzes 638388 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 6

1.1.3

MinHash now uses Numpy's random number generator instead of Python's built-in random. This makes MinHash generate consistent hash values across different Python versions.

The side-effect is that now MinHash created before version 1.1.3 won’t work (i.e., `jaccard`, `merge` and `union`) correctly with those created after.

1.1.2

* `LeanMinHash` is a subclass of `MinHash`. It uses less memory and allows faster (de)serialization. See [documentation](https://ekzhu.github.io/datasketch/documentation.html#lean-minhash) for details.
* Removed `serialize`, `deserialize`, and `bytesize` methods from `MinHash`. These are supported in `LeanMinHash` instead.
* Serialized `MinHash` objects before this version will not be deserialized properly. To migrate see [here](https://github.com/ekzhu/datasketch/issues/18#issuecomment-286645896).
* Documentation now have its own [website](https://ekzhu.github.io/datasketch)!

1.0.0

After nearly 2 years working on this project on-and-off, the API is now stable, and the features of MinHash-related sketches are completed.

I will continue to add more data sketches and indexes.

0.2.6

- MinHash LSH Forest implementation and benchmark using synthetic data
- Improve existing MinHash LSH benchmark using synthetic data for more tunable data distributions
- Improve MinHash and LSH performance

0.2.4

- Fixed Issue 4 - int overflow error on Windows platform
- Use Python build-in random number generator for better MinHash accuracy

0.2.3

- Add remove method for LSH index - `lsh.remove(key)`
- Add membership check for LSH - `key in lsh`

Page 5 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.