Eda-report

Latest version: v2.8.2

Safety actively analyzes 700916 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

2.8.2

* Specify "spawn" start method

Use `multiprocessing.get_context` to set "spawn" start method.
"fork" (default in POSIX systems) is deprecated.

* Show informative message if `tkinter` module is missing
* Set minimum python version to 3.10
* Test on python 3.10 to 3.12

Use recent versions of actions.

* Store Variable unique values as a sorted `numpy` array
* Update dependencies

In response to vulnerability reports in several dependencies.

2.8.1

What's New

* Drop `python-3.8`. It is no longer supported in *NumPy* and related packages.
* Update package config files to exclude tests in sdist.
* Bug Fix: Handle mixed-dtype datasets:

Mixed dtype datasets were stored with the `object` dtype. This caused issues e.g. sorting triggered a `TypeError` when numbers & strings are present. Data with the `object` dtype is now converted to the `string` dtype.

**Full Changelog**: https://github.com/Tim-Abwao/eda-report/compare/v2.8.0...v2.8.1

2.8.0

Improvements

* Use `matplotlib.rc_context` to customize plots while avoiding modifying global matplotlib state.
* Avoid storing copies of data in `Variable` instances to save on memory.
* Dynamically allocate image widths in documents to ensure plots maintain aspect ratio and are of appropriate size.
* Strip trailing zeros from float values in tables.
* Make `Dataset` repr more concise. Remove the "overview" title. Indent content. Rename summary statistics: "mean" to "avg", "std" to "stddev". Convert "count" values to integers.
* Consider data consisting of {Yes, No} or {Y, N} as boolean.
* Consider numeric variables with <11 unique values as categorical.
* Update the `summarize` function. Return a `Variable` for 1D data; a `Dataset` otherwise.
* Return `Axes` instances instead of `Figures` in all plotting functions.
* Dynamically handle repr logic in `Variable.__repr__`. Drop the redundant` _NumericStats`, `_DatetimeStats` and `_CategoricalStats` classes.

Additions

* Add the `ax` argument. Accept axes input in plotting functions. Very handy in cases where axes instances already exist e.g. subplots.
* Add `marker_color` and `line_color` args to `regression_plot` function.
* Add "(Normal)" to `prob_plot` x-axis label to indicate distribution
* Add xlabel and legend title to kde-plots.
* Add ylabel in grouped box-plots.
* Add "Count" ylabel to bar-plots.
* Add `Variable._get_most_common_categories` method to provide additional details for categorical variable repr.
* Add separate analysis and content-creation modules. The analysis module will focus on the actual analysis. The content module will focus on processing results for display.

Renamings
* Mark the analysis, cli, content, read_file and validate modules as private.
* Rename `groupby_data` argument in `get_word_report`, `ReportDocument`,
`_ReportContent` and `_AnalysisResult` to `groupby_variable`.
* Rename `ReportDocument._format_heading_spacing` method to `ReportDocument._format_paragraph_spacing`. Modify the method to work with any paragraph.
* Rename "numeric (<10 levels)" to "numeric (<=10 levels)", which is actually correct.
* Rename the multivariate module to bivariate. Correlation & scatterplots are bivariate analysis methods. Likewise, Rename `MultiVariable` to
`Dataset`.

**Full Changelog**: https://github.com/Tim-Abwao/eda-report/compare/v2.7.3...v2.8.0

2.7.3

What's new?

- Plot & display only the top 20 correlated pairs:

- Reduce scatterplot threshold from 50 to 20.
- Show correlation info in descending order of magnitude.

- Update contingency table logic:

Only create contingency table if data has < 20 unique values, to avoid cluttering the report.

- Update `plot_correlation` function:

- Darken and narrow bar edge-color.
- Set x-axis range to [-1.1, 1.1] so that bars with width slightly over 1.0 are not cut off.
- Return `None` when correlation info is missing.

- Refactor the multivariate module:

- Add the `_compute_correlation`, `_describe_correlation` and `_select_dtypes` functions, for reusability and more comprehensive testing.
- Add the `_correlation_values`, `_correlation_descriptions`, `_numeric_stats` and `_categorical_stats` attributes; and cut non-essential ones.
- Add the `_get_summary_statistics` method, and cut non-essential methods.
- Avoid omitting numeric columns with less than 0.05% unique values from bivariate analysis. This was meant to reduce the resultant scatter-plots, but in retrospect is not a good idea.

- Add python3.11 to test workflow.

2.7.2

What's New

- Allow running in the CLI without `tkinter`:
- If the input file and other args are provided, everything will run just fine - with neither a `ModuleNotFoundError` nor `ImportError` in case `tkinter` is missing.
- If no args are specified and `tkinter` is missing, then show a friendly message and exit gracefully.

- Add contingency tables to the report:

If a valid group-by variable is provided, a contingency table will now be added to the univariate analysis results of categorical variables.

- Update table creation function:
- Make column headers and index bold.
- Improve logic for handling header and other rows.

2.7.1

What's New

- Allow color selection in all plotting functions:

- Add the `color` arg to `bar_plot`, `box_plot`, `kde_plot` and `regression_plot`.
- Add `marker_color` and `line_color` args to `prob_plot`.
- Add `color_neg` and `color_pos` args to `plot_correlation`.

- Replace `set_custom_palette` with `_get_color_shades_of`:

- `_get_color_shades_of` generates shades of a desired color with no side-effects.
- `set_custom_palette` modifies the default matplotlib color cycle, which has a residual effect on other plots where the modified color cycle is undesired.
- Add `max_pairs` arg to `plot_correlation`:

Sets the maximum number of numeric pairs to include.
- Remove redundant `hue` arg from `prob_plot`.
- Mark `savefig` as private. It's really just used internally.

Page 1 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.