Dataprofiler

Latest version: v0.12.0

Safety actively analyzes 682244 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 10

0.4.1

**BUGFIX**: Enables running data profiler without the TensorFlow library

0.4.0

**New Features**
* Reduce profiling memory usage by ~50%
* Reduce profiling runtime by >75%
* Improve delimiter and header detection in delimited (CSV) data
* Add progress notifications for profiling

**Fixes**
* Adds warnings for sampling
* Selects proper options on profile mergers
* Fix repeated tensorflow warnings
* Thresholds input for large CSV files by bytes or lines (whichever is smaller)

0.3.5

* Enhancement: 50-90% reduced profiling time
* Improved methods for unique row and null-in-row prediction(s)
* Enhancement: Users can now select header row for delimited files
* Bug Fix: Added header detection on delimited files with *only strings*

0.3.4

* Significantly improved header detection on structured datasets
* Updated model
* New entities: `DATE`, `TIME`, `US_STATE`, `DRIVERS_LICENSE`
* Removed entities: `INTEGER_BIG`
* New [easier] way to extend labels to the model
* ML requirements installed separately via `pip install dataprofiler[ml]` - **required for labeler**
* Profiler & Labeler only load TensorFlow when necessary
* Minor bug fixes & improved testing

0.3.2

* TensorFlow only runs when a labeler executes
* Improved CSV detection
* 2-8x memory reduction in profiling
* Various bug fixes

0.3.1

* Dramatically reduced memory requirements for the data labeler
* Renamed the module: data_profiler -> dataprofiler
* Improved delimiter (CSV) file detection

Page 9 of 10

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.