Pycaret

Latest version: v3.3.2

Safety actively analyzes 723144 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 6

2.2.2

- Fixed an issue with the `optimize_threshold` function the `pycaret.classification` module. It now returns a float instead of an array.
- Fixed issue with the `predict_model` function. It now uses the original data frame to append the predictions. As such any extra columns given at the time of inference are not removed when returning the predictions. Instead they are internally ignored at the time of predictions.
- Fixed edge case exceptions for the `create_model` function in `pycaret.clustering`.
- Fixed exceptions when column names are not a string.
- Fixed exceptions in `pycaret.regression` when `transform_target` is True in the `setup` function.
- Fixed an exception in the `models` function if the `type` parameter is specified.
- All official tutorials are now updated.

2.2.1

Post-release `2.2`, the following issues have been fixed:
- Fixed `plot_model = 'tree'` exceptions.
- Fixed issue with `predict_model` causing errors with non-contiguous indices.
- Fixed issue with `remove_outliers` parameter in the `setup` function. It was introducing extra columns in training data. The issue has been fixed now.
- Fixed issue with `plot_model` in `pycaret.clustering` causing errors with non-contiguous indices.
- Fixed an exception when the model was saved or logged when `imputation_type` is set to 'iterative' in the `setup` function.
- `compare_models` now prints intermediate output when `html=False`.
- Metrics in `pycaret.classification` for binary classification are now calculated with `average='binary'`. Before they were a weighted average of positive and negative class, now they are just calculated for positive class. For multiclass classification `average='weighted'`.
- `optimize_threshold` now returns optimized probability threshold value as numpy object.
- Fixed issue with certain exceptions in `compare_models`.
- Added `profile_kwargs` argument in the `setup` function to pass keyword arguments to Pandas Profiler.
- `plot_model`, `interpret_model`, and `evaluate_model` now accepts a new parameter `use_train_data` which when set to True, generates plot on train data instead of test data.

2.2

- **get_metrics:** Returns table of available metrics used for CV.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`

- **add_metric:** Adds a custom metric for model evaluation.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`

- **remove_metric:** Remove custom metrics.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`

- **save_config:** save all global variables to a pickle file, allowing to later resume without rerunning the `setup` function.
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`

- **load_config:** Load global variables from pickle file into Python environment.
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`

setup
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`

Following new parameters have been added:

- **test_data: pandas.DataFrame, default = None**
If not None, test_data is used as a hold-out set, and the ``train_size`` parameter is ignored. test_data must be labeled and the shape of data and test_data must match.

- **preprocess: bool, default = True**
When set to False, no transformations are applied except for `train_test_split` and custom transformations passed in `custom_pipeline` param. Data must be ready for modeling (no missing values, no dates, categorical data encoding) when `preprocess` is set to False.

- **imputation_type: str, default = 'simple'**
The type of imputation to use. Can be either 'simple' or 'iterative'.

- **iterative_imputation_iters: int, default = 5**
The number of iterations. Ignored when ``imputation_type`` is not 'iterative'.

- **categorical_iterative_imputer: str, default = 'lightgbm'**
Estimator for iterative imputation of missing values in categorical features. Ignored when ``imputation_type`` is not 'iterative'.

- **numeric_iterative_imputer: str, default = 'lightgbm'**
Estimator for iterative imputation of missing values in numeric features. Ignored when ``imputation_type`` is set to 'simple'.

- **data_split_stratify: bool or list, default = False**
Controls stratification during 'train_test_split'. When set to True, will stratify by target column. To stratify on any other columns, pass a list of column names. Ignored when ``data_split_shuffle`` is False.

- **fold_strategy: str or sklearn CV generator object, default = 'stratifiedkfold' / 'kfold'**
Choice of cross validation strategy. Possible values are:
* 'kfold'
* 'stratifiedkfold'
* 'groupkfold'
* 'timeseries'
* a custom CV generator object compatible with scikit-learn.

- **fold: int, default = 10**
The number of folds to be used in cross-validation. Must be at least 2. This is a global setting that can be over-written at the function level by using the ``fold`` parameter. Ignored when ``fold_strategy`` is a custom object.

- **fold_shuffle: bool, default = False**
Controls the shuffle parameter of CV. Only applicable when ``fold_strategy`` is 'kfold' or 'stratifiedkfold'. Ignored when ``fold_strategy`` is a custom object.

- **fold_groups: str or array-like, with shape (n_samples,), default = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

- **use_gpu: str or bool, default = False**
When set to 'force', will try to use GPU with all algorithms that support it, and raise exceptions if they are unavailable. When set to True, will use GPU with algorithms that support it, and fall back to CPU if they are unavailable. When False, all algorithms are trained using CPU only.

- *custom_pipeline: transformer or list of transformers or tuple, default = None**
When passed, will append the custom transformers in the preprocessing pipeline and are applied on each CV fold separately and on the final fit. All the custom transformations are applied after 'train_test_split' and before pycaret's internal transformations.

compare_models
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **cross_validation: bool = True**
When set to False, metrics are evaluated on holdout set. ``fold`` param is ignored when cross_validation is set to False.

- **errors: str = "ignore"**
When set to 'ignore', will skip the model with exceptions and continue. If 'raise', will stop the function when exceptions are raised.

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

create_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **cross_validation: bool = True**
When set to False, metrics are evaluated on holdout set. ``fold`` param is ignored when cross_validation is set to False.

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

Following parameters have been removed:

- **ensemble** - Deprecated - use `ensemble_model` function directly.
- **method** - Deprecated - use `ensemble_model` function directly.
- **system** - Moved to private API.

tune_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **search_library: str, default = 'scikit-learn'**
The search library used for tuning hyperparameters. Possible values:

'scikit-learn' - default, requires no further installation
https://github.com/scikit-learn/scikit-learn

'scikit-optimize' - `pip install scikit-optimize`
https://scikit-optimize.github.io/stable/

'tune-sklearn' - `pip install tune-sklearn ray[tune]`
https://github.com/ray-project/tune-sklearn

'optuna' - `pip install optuna`
https://optuna.org/

- **search_algorithm: str, default = None**
The search algorithm depends on the `search_library` parameter. Some search algorithms require additional libraries to be installed. When None, will use the search library-specific default algorithm.

`scikit-learn` possible values:
- random (default)
- grid

`scikit-optimize` possible values:
- bayesian (default)

`tune-sklearn` possible values:
- random (default)
- grid
- bayesian `pip install scikit-optimize`
- hyperopt `pip install hyperopt`
- bohb `pip install hpbandster ConfigSpace`

`optuna` possible values:
- tpe (default)
- random

- **early_stopping: bool or str or object, default = False**
Use early stopping to stop fitting to a hyperparameter configuration if it performs poorly. Ignored when ``search_library`` is scikit-learn, or if the estimator does not have 'partial_fit' attribute. If False or None, early stopping will not be used. Can be either an object accepted by the search library or one of the following:

- 'asha' for Asynchronous Successive Halving Algorithm
- 'hyperband' for Hyperband
- 'median' for Median Stopping Rule
- If False or None, early stopping will not be used.

- **early_stopping_max_iters: int, default = 10**
The maximum number of epochs to run for each sampled configuration. Ignored if `early_stopping` is False or None.

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

- **return_tuner: bool, default = False**
When set to True, will return a tuple of (model, tuner_object).

- **tuner_verbose: bool or in, default = True**
If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when ``verbose`` param is False.

ensemble_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

blend_models
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

- **weights: list, default = None**
Sequence of weights (float or int) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights when None.

- The default value for the `method` parameter has been changed from `hard` to `auto`.

stack_models
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

calibrate_model
`pycaret.classification`

Following new parameters have been added:

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

plot_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fold: int or scikit-learn compatible CV generator, default = None**
Controls cross-validation. If None, the CV generator in the ``fold_strategy`` parameter of the ``setup`` function is used. When an integer is passed, it is interpreted as the 'n_splits' parameter of the CV generator in the ``setup`` function.

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

evaluate_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fold: int or scikit-learn compatible CV generator, default = None**
Controls cross-validation. If None, the CV generator in the ``fold_strategy`` parameter of the ``setup`` function is used. When an integer is passed, it is interpreted as the 'n_splits' parameter of the CV generator in the ``setup`` function.

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

finalize_model
`pycaret.classification` `pycaret.regression`

Following new parameters have been added:

- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.

- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.

- **model_only: bool, default = True**
When set to False, only the model object is re-trained and all the transformations in Pipeline are ignored.

models
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`

Following new parameters have been added:

- **internal: bool, default = False**
When True, will return extra columns and rows used internally.

- **raise_errors: bool, default = True**
When False, will suppress all exceptions, ignoring models that couldn't be created.
<br/><br/><br/>

2.1.2

- Post-release `2.1` a bug has been reported preventing the `predict_model` function to work in the `regression` module in a new notebook session when `transform_target` was set to `False` during model training. This issue has been fixed in PyCaret release `2.1.2`. To learn more about the issue: https://github.com/pycaret/pycaret/issues/525

2.1.1

- Post-release `2.1` a bug has been identified in MLFlow back-end. The error is only caused when `log_experiment` in the `setup` function is set to True and is applicable to all the modules. The cause of the error has been identified and an issue is opened with `MLFlow`. The error is caused by `infer_signature` function in `mlflow.sklearn.log_model` and is only raised when there are missing values in the dataset. This issue has been fixed in PyCaret release `2.1.1` by skipping the signature in cases where `MLFlow` raises exception.

2.1

Summary of Changes

- **Model Deployment** Model deployment support for `gcp` and `azure` has been added in `deploy_model` function for all modules. See `documentation` for details.
- **Compare Models Budget Time** new parameter `budget_time` added in `compare_models` function. To set the upper limit on `compare_models` training time, `budget_time` parameter can be used.
- **Feature Selection** New feature selection method `boruta` has been added for feature selection. By default, `feature_selection_method` parameter in the `setup` function is set to `classic` but can be set to `boruta` for feature selection using boruta algorithm. This change is applicable for `pycaret.classification` and `pycaret.regression`.
- **Numeric Imputation** New method `zero` has been added in the `numeric_imputation` in the `setup` function. When method is set to `zero`, missing values are replaced with constant 0. Default behavior of `numeric_imputation` is unchanged.
- **Plot Model** New parameter `scale` has been added in `plot_model` for all modules to enable high quality images for research publications.
- **User Defined Loss Function** You can now pass `custom_scorer` for optimizing user defined loss function in `tune_model` for `pycaret.classification` and `pycaret.regression`. You must use `make_scorer` from `sklearn` to create custom loss function that can be passed into `custom_scorer` for the `tune_model` function.
- **Change in Pipeline Behavior** When using `save_model` the `model` object is appended into `Pipeline`, as such the behavior of `Pipeline` and `predict_model` is now changed. Instead of saving a `list`, `save_model` now saves `Pipeline` object where trained model is on last position. The user functionality on front-end for `predict_model` remains same.
- **Compare Models** parameter `blacklist` and `whitelist` is now renamed to `exclude` and `include` with no change in functionality.
- **Predict Model Labels** The `Label` column returned by `predict_model` function in `pycaret.classification` now returns the original label instead of encoded value. This change is made to make output from `predict_model` more human-readable. A new parameter `encoded_labels` is added, which is `False` by default. When set to `True`, it will return encoded labels.
- **Model Logging** Model persistence in the backend when `log_experiment` is set to `True` is now changed. Instead of using internal `save_model` functionality, it now adopts to `mlflow.sklearn.save_model` to allow the use of Model Registry and `MLFlow` native deployment functionalities.
- **CatBoost Compatibility** `CatBoostClassifier` is now compatible with `blend_models` in `pycaret.classification`. As such `blend_models` without any `estimator_list` will now result in blending total of `15` estimators including `CatBoostClassifier`.
- **Stack Models** `stack_models` in `pycaret.classification` and `pycaret.regression` now adopts to `StackingClassifier()` and `StackingRegressor` from `sklearn`. As such the `stack_models` function now returns `sklearn` object instead of custom `list` in previous versions.
- **Create Stacknet** `create_stacknet` in `pycaret.classification` and `pycaret.regression` is now removed.
- **Tune Model** `tune_model` in `pycaret.classification` and `pycaret.regression` now inherits params from the input `estimator`. As such if you have trained `xgboost`, `lightgbm` or `catboost` on gpu will not inherits training method from `estimator`.
- **Interpret Model** `**kwargs` argument now added in `interpret_model`.
- **Pandas Categorical Type** All modules are now compatible with `pandas.Categorical` object. Internally they are converted into object and are treated as the same way as `object` or `bool` is treated.
- **use_gpu** A new parameter added in the `setup` function for `pycaret.classification` and `pycaret.regression`. In `2.1` it was added to prepare for the backend work required to make this change in future releases. As such using `use_gpu` param in `2.1` has no impact.
- **Unit Tests** Unit testing enhanced. Continious improvement in progress https://github.com/pycaret/pycaret/tree/master/pycaret/tests
- **Automated Documentation Added** Automated documentation now added. Documentation on Website will only update for `major` releases 0.X. For all minor monthly releases, documentation will be available on: https://pycaret.readthedocs.io/en/latest/
- **Introduction of GitHub Actions** CI/CD build testing is now moved from `travis-ci` to `github-actions`. `pycaret-nightly` is now being published every 24 hours automatically.
- **Tutorials** All tutorials are now updated using `pycaret==2.0`. https://github.com/pycaret/pycaret/tree/master/tutorials
- **Resources** New resources added under `/pycaret/resources/` https://github.com/pycaret/pycaret/tree/master/resources
- **Example Notebook** Many example notebooks added under `/pycaret/examples/` https://github.com/pycaret/pycaret/tree/master/examples

Page 5 of 6

Releases

Has known vulnerabilities

Previous Next

Pycaret

Page 5 of 6

2.2.2

2.2.1

2.2

2.1.2

2.1.1

2.1

Page 5 of 6

Links

Releases