Freamon

Latest version: v0.3.57

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 10

0.3.44

* Added interactive progress tracking for deduplication:
* Implemented real-time progress indicators for Jupyter notebooks
* Added ETA calculations based on completed comparisons
* Added block processing progress tracking for chunked processing
* Implemented memory usage monitoring
* Created examples demonstrating the progress tracking functionality
* Added dependencies for Jupyter integration (ipython, ipywidgets)
* Added psutil dependency for memory tracking

0.3.43

* Added blocking and LSH functionality to `flag_similar_records`:
* Implemented blocking for faster deduplication based on exact column matches
* Added Locality-Sensitive Hashing (LSH) for approximate matching with high performance
* Implemented phonetic and n-gram based blocking strategies
* Added combined blocking and LSH approach for very large datasets
* Created dedicated modules for blocking and LSH implementations
* Added comprehensive example demonstrating performance improvements
* Added optional dependencies for LSH (datasketch) and phonetic matching (jellyfish)

0.3.42

* Fixed bug in deduplication module:
* Fixed `flag_similar_records` function to handle empty weights dictionary
* Prevented division by zero error when normalizing weights

0.3.41

* Improved memory efficiency in deduplication module:
* Added missing parameters to `flag_similar_records` function for better backward compatibility
* Enhanced memory optimization for large dataset processing
* Implemented generator-based pair creation instead of storing all pairs in memory
* Added batch processing for record comparisons
* Improved garbage collection during intensive operations
* Added strategic processing order for duplicate detection
* Enhanced parallel processing with configurable workers
* Added progress reporting for long-running operations
* Implemented adaptive chunk size reduction for very large datasets
* Added proportional sampling across chunks when using limits
* Documentation and examples:
* Created comprehensive examples demonstrating memory-efficient deduplication
* Added benchmarking tools to compare different approaches
* Updated documentation with parameter explanations and best practices
* Added visualization capabilities for duplicate group analysis
* Fixed documentation to reflect current parameter names

0.3.40

* Added comprehensive documentation and examples:
* Created detailed examples for duplicate flagging functionality
* Added documentation for advanced EDA features
* Added documentation for automatic train-test splitting in automodeling
* Added export capabilities documentation for PowerPoint and Excel
* Enhanced README with improved dependency information and examples
* Added cross-references between documentation files for better discoverability
* Added examples demonstrating complete end-to-end workflows
* Enhanced Jupyter notebook integration:
* Added notebook-friendly examples with interactive visualizations
* Improved display capabilities for deduplication reporting
* Added comprehensive workflow examples in a notebook-friendly format
* Added display_eda_report method for interactive EDA reports in Jupyter
* Features include:
* Streamlined installation instructions with dependency groups
* Clearer documentation of optional feature requirements
* Examples showing integration between different modules
* Performance optimization guidance for large datasets
* Detailed examples of advanced features
* Interactive visualized EDA reports for Jupyter notebooks
* Improved user experience:
* Added complete workflow examples from data loading to modeling
* Better explanations of component interactions
* Clearer API documentation with usage examples
* More comprehensive examples showing real-world use cases
* Quick reports for exploratory analysis in interactive environments

0.3.39

* Added duplicate flagging functionality:
* `flag_exact_duplicates()` to identify exact matches across specified columns
* `flag_text_duplicates()` for identifying similar text content with multiple methods
* `flag_similar_records()` for multi-column weighted similarity detection
* `flag_supervised_duplicates()` for ML-based duplicate identification
* `add_duplicate_detection_columns()` as a high-level wrapper for all methods
* Added performance optimizations for large datasets:
* Chunked processing to handle datasets too large for all-pairs comparison
* Streaming LSH implementation for text collections that don't fit in memory
* Parallel processing capabilities with configurable number of workers
* Polars integration for faster string operations and reduced memory usage
* Network-based algorithms for identifying duplicate clusters efficiently
* Features include:
* Non-destructive duplicate identification (adds columns instead of removing rows)
* Support for both pandas and polars DataFrames
* Multiple similarity measures (hash, n-gram, fuzzy matching, LSH)
* Graph-based clustering for finding duplicate groups
* Configurable thresholds and weighting for different columns
* Integration with existing deduplication framework
* Memory efficiency improvements for very large datasets (50-70% reduction)
* Performance improvements of 2-5x for large datasets
* Added comprehensive benchmarking tools for comparing different implementations
* Added test suite and examples for all duplicate flagging methods and optimizations

Page 3 of 10

Releases

Has known vulnerabilities

Previous Next

Freamon

Page 3 of 10

0.3.44

0.3.43

0.3.42

0.3.41

0.3.40

0.3.39

Page 3 of 10

Links

Releases