Dataprofiler

Latest version: v0.12.0

Safety actively analyzes 706487 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 10

0.7.0

Profiler
* Can now take the difference between two profiles 277, 279, 282, 295, 297, 300, 301, 302, 318, 319, 324, 336, 339, 349, 355, 358, 359, 366
* Correlations can now update with new data / merge and NaNs 342
* Add text memory size to unstructured profiler 340
* Add timeit functionality to top level profilers 344, 346
* Users can now specify what is considered a null value 347

Readers
* Can now ingest StringIO and BytesIO 348, 350, 351, 352, 353, 354, 364,
* Allow internal data function calls directly from our data class 360

Runtime Changes
* Abstract NumericalStatsMixin profile for columns 337
* Added profiler min true samples error checking 365

Bug fixes
* Allow users to send in non-string value for structured labeling 343
* Profiler samples now doesn't change visual representation when passed as a list 363

Other Changes
* requirements.txt changes added scipy 369
* Update throughput testing changes 356
* Version updated 370
* Github Pages updated 345, 362, 372, 373

0.6.1

Profiler

- Options added to allow setting 'k' concerning the top k highest counts of categorical 325
- Improved CSV data streaming to accept StringIO/BytesIO 327

Runtime

- Text in Unstructured profiler now keep a count of word 321

Bug Fixes

- Fixed unalikeability bug that caused errors on datasets with only one sample 341

Other Changes

- Standardized through-put for structured testing 298

0.6.0

Profiler
* Structured Profiler can now take in duplicate columns 315
- this is an api Change to access to the data in the report, data_stats is now a list
* Categorical Profile now includes top 5 counts 299
* Add new categorical statistics: gini impurity and unalikeability 308, 320
* Unstructured Data Labeler profile now includes entity percentages 305
* Add Pearson's correlation to the Structured Profiler 281, 307, 317
* Unstructured Profiler Text vocab now outputs a top k highest vocab counts 304, 314

Runtime Changes
* Categorical Profiler keeps an internal count of categories 296
* Text in Unstructured profiler now keep a count of vocab 304
* Data Reader's `is_match function can now take in StringIO/ByteIO 292 ,306, 326

Bug fixes
* Bug fix to make sure samples being stored by UnstructuredProfiler save 313

Other Changes
* Documentation on contributions added 310, 311, 312, 333,
* Github Pages updated 309, 316, 322, 323, 329, 330, 331, 334

0.5.3

Bug fixes
* remove unused import causing profiler error 290

0.5.2

Profiler
* A library level seed value is now settable by the user to make the sampling during Profiling deterministic `dp.set_seed` 271
* NumericalStats now include *skewness*, *kurtosis*, *Counter Zeros*, and *Count Negatives* 266, 267, 272, 273
* User can turn off bias correction for variance, skewness, and kurtosis 269
* Sum is returned in NumericalStats Profiles 264

Runtime Changes
* Warnings will be issued when invalid is received by the NumericalStats profilers 280

Bug fixes
* Default values for variance, skewness, and kurtosis are `np.nan` 275
* Options no longer propagate to all levels when setting a single level property unless a wildcard is specified e.g. `*.is_enabled` 270

Other Changes
* Documentation on contributions added 268
* Github Pages updated 284 285, 287, 288

0.5.1

Bug fixes
* Fix merging UnstructuredProfiler 255
* Fix bug in saving profiles without a labeler 257

Other Changes
* Documentation: Add UnstructuredProfiler examples 252

Page 7 of 10

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.