What's New
- Add the `set_custom_palette`, `box_plot`, `kde_plot`, `probability_plot`, `bar_plot`, `regression_plot` and `plot_correlation` functions (See [Plotting Examples][plotting]).
- Rename `target_variable` to `groupby_data`:
- Select a more intuitive name. Target_variable is ambiguous.
- Add the groupby [-g, --groupby] cli arg.
- Update document layout:
- Center-align images and tables.
- Reduce unnecessary page-breaks.
- Replace correlation heatmap with a bar chart:
Show coloured & labeled bars of the top 20 correlated numeric variable pairs (by magnitude). Makes it much easier to notice highly correlated variables.
- Limit bivariate summaries & regression plots to 50.
Necessary since combinations blow up quickly. 50 numeric columns could easily result in a 500 page report, taking ages to prepare (`combination(50_numeric_cols, 2) == 1225` pairs, and 1 page == 2 pairs). Now only the top 50 pairs will be published (approx 25 pages).
- Configure color in each subprocess:
Update helper functions to accept color choice, and set custom palette. Spawned subprocesses (Windows & Mac currently) weren't getting the globally modified colors.
- Reduce graph image dpi from 250 to 150:
Results in smaller, but very decent images. Significantly reduces the size of report documents with many variables.
- Revise correlation interpretation.
Use R.H. Evans (1966) guide:
.00-.19 -> very weak
.20-.39 -> weak
.40-.59 -> moderate
.60-.79 -> strong
.80-1.0 -> very strong
- Fix handling of int values for `groupby` specifier:
- Int input from the cli and gui is parsed as a string, and failed the `isinstance(x, int)` test.
- The `str.isdecimal` test is more suitable here.
- Optimize tests:
- Add conftest.py.
- Add a session-level temp_data_dir fixture.
[plotting]: https://eda-report.readthedocs.io/en/latest/eda_report.plotting.html#plotting-examples
**Full Changelog**: https://github.com/Tim-Abwao/eda-report/compare/v2.6.0...v2.7.0