Dataprofiler

Latest version: v0.13.2

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 10

0.7.3

Profiler
* Add ability to full install the dataprofiler with one command 424

Other Changes
* Documentation / Github Pages updated 422
* Updated examples 425

0.7.2

Profiler
* Add median to numeric stats 389
* Chi square tests, added to profiler 392
* Chi square/homogeneity, median, mode, MAD differences 398, 400

Readers


Graphs
* add missing values matrix 403
* update histogram to use column indexes 404
* Add warning to user when reqs not installed 407

Bug fixes
* Fix bug in mode when disabled 388
* Update exception text for ssl_verify error 395
* ssl verify misnaming fix and consecutive spaces in csv fix 405
* fix cnn confidences not slicing data correctly 419

Other Changes
* Documentation / Github Pages updated 390, 391, 393, 394, 396, 397, 399, 401, 402, 412, 416, 418
* Add examples 413, 417, 420

0.7.1

Profiler
* Validate min_true_samples in update_profile 377
* Add mode to numeric stats 382

Readers
* Readers now accepts a url to a file for reading 375
* Allow text to determine encoding automatically 378

Graphs
* Graphs: Create function which accepts a profiler and creates histogram bar charts 367

Bug fixes
* Fixes bug in _get_quantiles when median case occurs 383
* Catch Divide by 0 bug for unique row ratio 384
* Make clean data function static again due to multiprocessing and model issue 385

Other Changes
* Version updated 386
* Documentation / Github Pages updated 379, 380, 381, 387

0.7.0

Profiler
* Can now take the difference between two profiles 277, 279, 282, 295, 297, 300, 301, 302, 318, 319, 324, 336, 339, 349, 355, 358, 359, 366
* Correlations can now update with new data / merge and NaNs 342
* Add text memory size to unstructured profiler 340
* Add timeit functionality to top level profilers 344, 346
* Users can now specify what is considered a null value 347

Readers
* Can now ingest StringIO and BytesIO 348, 350, 351, 352, 353, 354, 364,
* Allow internal data function calls directly from our data class 360

Runtime Changes
* Abstract NumericalStatsMixin profile for columns 337
* Added profiler min true samples error checking 365

Bug fixes
* Allow users to send in non-string value for structured labeling 343
* Profiler samples now doesn't change visual representation when passed as a list 363

Other Changes
* requirements.txt changes added scipy 369
* Update throughput testing changes 356
* Version updated 370
* Github Pages updated 345, 362, 372, 373

0.6.1

Profiler

- Options added to allow setting 'k' concerning the top k highest counts of categorical 325
- Improved CSV data streaming to accept StringIO/BytesIO 327

Runtime

- Text in Unstructured profiler now keep a count of word 321

Bug Fixes

- Fixed unalikeability bug that caused errors on datasets with only one sample 341

Other Changes

- Standardized through-put for structured testing 298

0.6.0

Profiler
* Structured Profiler can now take in duplicate columns 315
- this is an api Change to access to the data in the report, data_stats is now a list
* Categorical Profile now includes top 5 counts 299
* Add new categorical statistics: gini impurity and unalikeability 308, 320
* Unstructured Data Labeler profile now includes entity percentages 305
* Add Pearson's correlation to the Structured Profiler 281, 307, 317
* Unstructured Profiler Text vocab now outputs a top k highest vocab counts 304, 314

Runtime Changes
* Categorical Profiler keeps an internal count of categories 296
* Text in Unstructured profiler now keep a count of vocab 304
* Data Reader's `is_match function can now take in StringIO/ByteIO 292 ,306, 326

Bug fixes
* Bug fix to make sure samples being stored by UnstructuredProfiler save 313

Other Changes
* Documentation on contributions added 310, 311, 312, 333,
* Github Pages updated 309, 316, 322, 323, 329, 330, 331, 334

Page 7 of 10

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.