This release contains an overhaul of the `data_summary` feature and minor bug fixes.
Changes
- Updated the contributing guide haishiro (377) (368)
Features
- Reworked data summary (see below) haishiro (383)
- Added progress bar when fitting topic model truongc2 (393)
- Added support for Python 3.6 haishiro (369)
Bug Fixes
- Fixed backend recursion bug haishiro (396)
- Removed Extra Cell in User Guide zack-soenen (394)
- Added kwargs to text preprocessing functions: filter\_dictionary, create\_doc\_term\_matrix, and create\_tfidf\_matrix truongc2 (386)
Maintenance
- Disabled checks on draft PRs truongc2 (399)
- Updated actions/setup-python requirement to v2.1.4 dependabot (421)
- Added workflow dispatch events for manual workflow triggers haishiro (423)
- Bumped peaceiris/actions-gh-pages from v3.7.0-8 to v3.7.3 dependabot (422)
- Added dependabot haishiro (416)
- Added script to rerun notebooks in CI prior to unit tests truongc2 (224)
Data Summary
![image](https://user-images.githubusercontent.com/11744859/101098584-c9a83300-3588-11eb-856e-1c646901d923.png)
1. An additional display (DataFrame) of row count, column count, and size in memory was added
2. The orientation of the summary table has been transposed so that the data columns are in rows. The motivation is for this change is that it is intended to scale better on datasets with a large amount of columns.
3. Improved the performance of data_summary when using the pandas backend. The prior implementation using pandas .agg() resulted in very long computation times even for small datasets.
4. Added `Unique` metric - the number of unique values
5. Changed the ordering of metrics. The motivation is to present the metrics in a more logical order of inspection.
6. Added additional display options:
- `as_percentage`: Format any count metrics (`zeroes`, `nulls`, `top frequency`) as a percentage over the total row count instead.
- `auto_float`: Attempted to add sensible defaults when displaying floats by avoiding scientific notation and excessive precision. Set this option to `False` to disable the new formatting.