Bluecast

Latest version: v1.6.2

Safety actively analyzes 688674 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

1.6.2

We are happy to announce the release of version 1.6.2!

Besides improvements and bug fixes, version 1.6.2 introduces a new feature: Configuration via a UI in the notebook.
For beginners it might be tough to customize BlueCast, thus we introduce this feature. It offers to configure the most important
settings via a non-programmatic way.

![welcome_ui](https://github.com/user-attachments/assets/bee53b8d-1058-411a-b33c-297033b8ae9d)


**New features**
* Add non-programmatic way of configuring BlueCast instance
* Strengthen support for S3: Error analysis is possible via SageMaker/S3 now
* add `plot_distribution_pairs` EDA func

**Improvements**
* Update dependencies for devs.
* Improve tie handling within conformal prediction to get better prediction sets
* Change output of `predict_interval` within Conformal prediction wrapper for classification to a nested list. This will make post processing easier and more robust.
* Fix mutual information score plot annotations to go out of bounds
* Add param to multivariate datadrift that subsamples both dfs to same size

**Bug fixes**
* Fix a small bug where feature type detector fails when `ignore_cols` is an empty list.
* Strengthen Polars support: Fix a bug where FeatureTypeDetector maps Decimal type columns to object

1.6.1

A small release with just little changes:

* Reduce number of default SHAP charts shown during training
* set `use_full_data_for_final_model` to True by default

1.6.0

New features

* Add training config param to use repeated stratified kfold strategy during tuning
* Add option for `predict` method in `BlueCast` class to return original target labels
* Add new `plot_distribution_by_time` EDA function
* Add new `plot_classification_target_distribution_within_categories` EDA function
* Add new `plot_pca_biplot` EDA function
* Add new `plot_andrews_curve` EDA function
* Add new plot `plot_against_target_for_regression`
* Add additional feature engineering capabilities via `GroupLevelAggFeatures`
* Show histograms on 2nd axis in eCDF plots

Changes

* Suppress seaborn warnings
* Use Freedman-diaconis to determine number of bins for histograms
* Suppress additional warnings out of control in the library
* Only show global SHAP values by default

Bug fixes:

* Add handling of infrequent categories
* fix detection of numerical features
* fix a bug when target class is a string, but categorical encoding shall be done via ml algorithm

1.5.0

ModelMatchMaker

So far Bluecast provided tools to measure data drift, but it did not include anything to deal with it.
This release fills this gap and adds a `ModelMatchMaker`. It is a simple utility that allows to store
multiple training datasets and BlueCast instances. Then users can provide a new dataset and
`ModelMatchMaker` returns the dataset with the least data drift in comparison. It also returns the
associated `BlueCast` instance. From here users could add the matching dataset to the new dataset or
use the best matching model instead of training a new one (here using a match for the unseen data).

See the docs for more information.

ErrorAnalyser

So far BlueCast provided lots of information about model already:

* see all hyperparameter sets and their evaluation scores (optional)
* see most important hyperparameters
* feature importance
* evaluation on unseen data (when using `fit_eval`)
* it was also possible to plot the decision trees

However `BlueCast` lacked any work with out of fold datasets. With version 1.5.0 users can
change the training config and set a path to store out of fold data. `ErrorAnalyser` helps with evaluation
of the out of fold data. It has two core insights to offer:

* plotting prediction error distributions for all categories or bins of numerical features for each target class
or target bin
* return a preprocessed DataFrame that shows the mean absolute prediction error for each sub segment of the data

See the docs for more information.

Additional changes

* update poetry environment for developers
* add `max_bin` to tuning
* set 5 folds as new default for more robust single-model instances

1.4.3

A very small release.

Users can modify the xgboost config parameter to tell optuna to minimize or maximize the direction such that AUC would be optimized into the right direction if the xgboost eval metric has been updated to AUC.

1.4.2

* add effectivenes encoding into onehot encoder
* fix GPU detection
* Fix bugs when removing columns columns with zero variance or all nulls
* Add dynamic hyperparam tuning updates
* Fix logging of best params

Page 1 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.