Runtime Changes
Migrating from v0.4.2 to v0.4.3 should result in a **30-90% reduction in profiling time**.
Largely dependent on system resources and data size.
Notes
* Remove requirement for tensorflow-addons
* Library now works with tensorflow nightly (Python 3.9)
* Added example on generating a new data labeler
Profiler
* Multiprocessing data preprocessing
* Improved histogram accuracy
* Reduced histogram generation runtime
* Option to set the bin count for histogram
* Expanded precision and switch to precision estimation (as opposed to exact calculations)
* Limit pool size based on cpu and memory limitations
Data
* Improved JSON detection method
* Option (default) pulls metadata and data separately (`data.meta` and `data.data`)
* data.meta would be part of the JSON which contains no records
* data.data would be part of the JSON which contains records
* Added option to select keys which represent records
Report
* Precision report now contains additional details
"precision": {
'min': int,
'max': int,
'mean': float,
'var': float,
'std': float,
'sample_size': int,
'margin_of_error': float,
'confidence_level': float
},
Bug fixes
* Fixed error in merging options
* Fixed issue related to merging DateTimeColumns
* Fixed multiprocessing on OSX
* Fixed row calculations if `min_true_samples` is greater than zero