Bluecast

Latest version: v1.6.4

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 7

1.6.4

This release adds more robustness and increases performance a bit.

* target encoding will be executed in cross-validated fashion instead of just simple target encoding outside of folds
* datetime features will also produce cyclic transformations of those. To prevent overfitting we ignore all date components that have just one unique value.
* `calibrate` methods will automatically convert to pandas series if a numpy array has been provided
* Domain checker and Modelmatchmaker support GPU acceleration now
* hyperparameter tuning will not tune number of estimators by default. It is fixed to 1000, but can be customized

1.6.2

We are happy to announce the release of version 1.6.2!

Besides improvements and bug fixes, version 1.6.2 introduces a new feature: Configuration via a UI in the notebook.
For beginners it might be tough to customize BlueCast, thus we introduce this feature. It offers to configure the most important
settings via a non-programmatic way.

![welcome_ui](https://github.com/user-attachments/assets/bee53b8d-1058-411a-b33c-297033b8ae9d)

**New features**
* Add non-programmatic way of configuring BlueCast instance
* Strengthen support for S3: Error analysis is possible via SageMaker/S3 now
* add `plot_distribution_pairs` EDA func

**Improvements**
* Update dependencies for devs.
* Improve tie handling within conformal prediction to get better prediction sets
* Change output of `predict_interval` within Conformal prediction wrapper for classification to a nested list. This will make post processing easier and more robust.
* Fix mutual information score plot annotations to go out of bounds
* Add param to multivariate datadrift that subsamples both dfs to same size

**Bug fixes**
* Fix a small bug where feature type detector fails when `ignore_cols` is an empty list.
* Strengthen Polars support: Fix a bug where FeatureTypeDetector maps Decimal type columns to object

1.6.1

A small release with just little changes:

* Reduce number of default SHAP charts shown during training
* set `use_full_data_for_final_model` to True by default

1.6.0

New features

* Add training config param to use repeated stratified kfold strategy during tuning
* Add option for `predict` method in `BlueCast` class to return original target labels
* Add new `plot_distribution_by_time` EDA function
* Add new `plot_classification_target_distribution_within_categories` EDA function
* Add new `plot_pca_biplot` EDA function
* Add new `plot_andrews_curve` EDA function
* Add new plot `plot_against_target_for_regression`
* Add additional feature engineering capabilities via `GroupLevelAggFeatures`
* Show histograms on 2nd axis in eCDF plots

Changes

* Suppress seaborn warnings
* Use Freedman-diaconis to determine number of bins for histograms
* Suppress additional warnings out of control in the library
* Only show global SHAP values by default

Bug fixes:

* Add handling of infrequent categories
* fix detection of numerical features
* fix a bug when target class is a string, but categorical encoding shall be done via ml algorithm

1.5.0

ModelMatchMaker

So far Bluecast provided tools to measure data drift, but it did not include anything to deal with it.
This release fills this gap and adds a `ModelMatchMaker`. It is a simple utility that allows to store
multiple training datasets and BlueCast instances. Then users can provide a new dataset and
`ModelMatchMaker` returns the dataset with the least data drift in comparison. It also returns the
associated `BlueCast` instance. From here users could add the matching dataset to the new dataset or
use the best matching model instead of training a new one (here using a match for the unseen data).

See the docs for more information.

ErrorAnalyser

So far BlueCast provided lots of information about model already:

* see all hyperparameter sets and their evaluation scores (optional)
* see most important hyperparameters
* feature importance
* evaluation on unseen data (when using `fit_eval`)
* it was also possible to plot the decision trees

However `BlueCast` lacked any work with out of fold datasets. With version 1.5.0 users can
change the training config and set a path to store out of fold data. `ErrorAnalyser` helps with evaluation
of the out of fold data. It has two core insights to offer:

* plotting prediction error distributions for all categories or bins of numerical features for each target class
or target bin
* return a preprocessed DataFrame that shows the mean absolute prediction error for each sub segment of the data

See the docs for more information.

Additional changes

* update poetry environment for developers
* add `max_bin` to tuning
* set 5 folds as new default for more robust single-model instances

1.4.3

A very small release.

Users can modify the xgboost config parameter to tell optuna to minimize or maximize the direction such that AUC would be optimized into the right direction if the xgboost eval metric has been updated to AUC.

Page 1 of 7

Releases

Has known vulnerabilities

Bluecast

Page 1 of 7

1.6.4

1.6.2

1.6.1

1.6.0

1.5.0

1.4.3

Page 1 of 7

Links

Releases