Data-describe

Latest version: v0.1.0b3

Safety actively analyzes 622867 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.1.0b3

This release contains an overhaul of the `data_summary` feature and minor bug fixes.

Changes

- Updated the contributing guide haishiro (377) (368)

Features

- Reworked data summary (see below) haishiro (383)
- Added progress bar when fitting topic model truongc2 (393)
- Added support for Python 3.6 haishiro (369)

Bug Fixes

- Fixed backend recursion bug haishiro (396)
- Removed Extra Cell in User Guide zack-soenen (394)
- Added kwargs to text preprocessing functions: filter\_dictionary, create\_doc\_term\_matrix, and create\_tfidf\_matrix truongc2 (386)

Maintenance

- Disabled checks on draft PRs truongc2 (399)
- Updated actions/setup-python requirement to v2.1.4 dependabot (421)
- Added workflow dispatch events for manual workflow triggers haishiro (423)
- Bumped peaceiris/actions-gh-pages from v3.7.0-8 to v3.7.3 dependabot (422)
- Added dependabot haishiro (416)
- Added script to rerun notebooks in CI prior to unit tests truongc2 (224)

Data Summary
![image](https://user-images.githubusercontent.com/11744859/101098584-c9a83300-3588-11eb-856e-1c646901d923.png)

1. An additional display (DataFrame) of row count, column count, and size in memory was added
2. The orientation of the summary table has been transposed so that the data columns are in rows. The motivation is for this change is that it is intended to scale better on datasets with a large amount of columns.
3. Improved the performance of data_summary when using the pandas backend. The prior implementation using pandas .agg() resulted in very long computation times even for small datasets.
4. Added `Unique` metric - the number of unique values
5. Changed the ordering of metrics. The motivation is to present the metrics in a more logical order of inspection.
6. Added additional display options:
- `as_percentage`: Format any count metrics (`zeroes`, `nulls`, `top frequency`) as a percentage over the total row count instead.
- `auto_float`: Attempted to add sensible defaults when displaying floats by avoiding scientific notation and excessive precision. Set this option to `False` to disable the new formatting.

0.1.0b2

This patch focuses on addressing errors related to installation of data-describe.

Bug Fixes

- Fixed backend logic when unsupported data types are given haishiro (347)
- Updated setup() metadata for PyPI haishiro (348)
- Resolved errors when missing IPython and importlib.metadata semi-optional dependencies haishiro (346)
- Data Heatmap: Added legend label and moved to object-oriented mpl API haishiro (343)

Maintenance

- Updated CI Github Action haishiro (355)
- Added codecov.io for coverage checks haishiro (350)

0.1.0b1

Changes

- Standardized or updated documentation and naming conventions haishiro (328)
- Moved backend implementations back into core haishiro (306)
- Improved dependency management haishiro (302)

Features

- Cleaned up docker (Resolves 176) haishiro (205)

Bug Fixes

- Fixed statsmodels being required when it should be optional haishiro (340)
- Fixed pyscagnostics being required when should be optional haishiro (339)
- Prevented modin import on data-describe import haishiro (336)
- Fixed presidio import on data-describe import haishiro (334)
- Added random_state default to topic model haishiro (313)
- Updated seaborn usage for upcoming 0.12 API haishiro (305)

Maintenance

- Added exclude label for Release Drafter haishiro (337)
- Disabled creation of alpha docs haishiro (326)
- Added local api docs build directory to gitignore haishiro (335)
- Enabled pypi release haishiro (327)
- Added black to pre-commit checks haishiro (318)
- Limited publish of latest docs on relevant paths haishiro (316)
- Updated github cache action to v2 haishiro (315)

0.1.0a2

This release includes multiple changes and bugfixes for the alpha testing period.

Changes

- sklearn requirement bumped to 0.23 haishiro (279)
- seaborn requirement bumped to 0.11 to use new `displot` function haishiro (287)
- Documentation and build workflows now trigger on release `published` event instead of `created` haishiro (304)

Features

- Added more details to example notebooks haishiro (282)

Bug Fixes

- Fixed data_summary when a column is entirely null haishiro (301)
- Fixed data heatmap ordering haishiro (283)
- Fixed correlation matrix style to be more consistent (Resolves 236) haishiro (277)
- Fixed link to contributing guide (Fixes 163) haishiro (280)
- General improvements to stability haishiro (274) (275) (273)
- Renamed references to data describe in documentation to be more consistent with branding haishiro (259)

Maintenance

- Fixed and improved auto-generated documentation haishiro (252)
- Fixed PyPI release pipeline haishiro (253)
- Added Release Drafter for automated release notes haishiro (286)
- Simplified and updated issue templates haishiro (261)

0.1.0a1

==================
First release for private beta testing

New Features
- Clustering
- Correlation
- Data Heatmap
- Data Summary
- Distributions
- Scatter plots
- Feature importance
- Time series analysis
- Text preprocessing
- Topic Modeling
- Sensitive data (privacy)
- Dimensionality Reduction

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.