- **get_metrics:** Returns table of available metrics used for CV.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`
- **add_metric:** Adds a custom metric for model evaluation.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`
- **remove_metric:** Remove custom metrics.
`pycaret.classification` `pycaret.regression` `pycaret.clustering`
- **save_config:** save all global variables to a pickle file, allowing to later resume without rerunning the `setup` function.
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`
- **load_config:** Load global variables from pickle file into Python environment.
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`
setup
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`
Following new parameters have been added:
- **test_data: pandas.DataFrame, default = None**
If not None, test_data is used as a hold-out set, and the ``train_size`` parameter is ignored. test_data must be labeled and the shape of data and test_data must match.
- **preprocess: bool, default = True**
When set to False, no transformations are applied except for `train_test_split` and custom transformations passed in `custom_pipeline` param. Data must be ready for modeling (no missing values, no dates, categorical data encoding) when `preprocess` is set to False.
- **imputation_type: str, default = 'simple'**
The type of imputation to use. Can be either 'simple' or 'iterative'.
- **iterative_imputation_iters: int, default = 5**
The number of iterations. Ignored when ``imputation_type`` is not 'iterative'.
- **categorical_iterative_imputer: str, default = 'lightgbm'**
Estimator for iterative imputation of missing values in categorical features. Ignored when ``imputation_type`` is not 'iterative'.
- **numeric_iterative_imputer: str, default = 'lightgbm'**
Estimator for iterative imputation of missing values in numeric features. Ignored when ``imputation_type`` is set to 'simple'.
- **data_split_stratify: bool or list, default = False**
Controls stratification during 'train_test_split'. When set to True, will stratify by target column. To stratify on any other columns, pass a list of column names. Ignored when ``data_split_shuffle`` is False.
- **fold_strategy: str or sklearn CV generator object, default = 'stratifiedkfold' / 'kfold'**
Choice of cross validation strategy. Possible values are:
* 'kfold'
* 'stratifiedkfold'
* 'groupkfold'
* 'timeseries'
* a custom CV generator object compatible with scikit-learn.
- **fold: int, default = 10**
The number of folds to be used in cross-validation. Must be at least 2. This is a global setting that can be over-written at the function level by using the ``fold`` parameter. Ignored when ``fold_strategy`` is a custom object.
- **fold_shuffle: bool, default = False**
Controls the shuffle parameter of CV. Only applicable when ``fold_strategy`` is 'kfold' or 'stratifiedkfold'. Ignored when ``fold_strategy`` is a custom object.
- **fold_groups: str or array-like, with shape (n_samples,), default = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
- **use_gpu: str or bool, default = False**
When set to 'force', will try to use GPU with all algorithms that support it, and raise exceptions if they are unavailable. When set to True, will use GPU with algorithms that support it, and fall back to CPU if they are unavailable. When False, all algorithms are trained using CPU only.
- *custom_pipeline: transformer or list of transformers or tuple, default = None**
When passed, will append the custom transformers in the preprocessing pipeline and are applied on each CV fold separately and on the final fit. All the custom transformations are applied after 'train_test_split' and before pycaret's internal transformations.
compare_models
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **cross_validation: bool = True**
When set to False, metrics are evaluated on holdout set. ``fold`` param is ignored when cross_validation is set to False.
- **errors: str = "ignore"**
When set to 'ignore', will skip the model with exceptions and continue. If 'raise', will stop the function when exceptions are raised.
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
create_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **cross_validation: bool = True**
When set to False, metrics are evaluated on holdout set. ``fold`` param is ignored when cross_validation is set to False.
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
Following parameters have been removed:
- **ensemble** - Deprecated - use `ensemble_model` function directly.
- **method** - Deprecated - use `ensemble_model` function directly.
- **system** - Moved to private API.
tune_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **search_library: str, default = 'scikit-learn'**
The search library used for tuning hyperparameters. Possible values:
'scikit-learn' - default, requires no further installation
https://github.com/scikit-learn/scikit-learn
'scikit-optimize' - `pip install scikit-optimize`
https://scikit-optimize.github.io/stable/
'tune-sklearn' - `pip install tune-sklearn ray[tune]`
https://github.com/ray-project/tune-sklearn
'optuna' - `pip install optuna`
https://optuna.org/
- **search_algorithm: str, default = None**
The search algorithm depends on the `search_library` parameter. Some search algorithms require additional libraries to be installed. When None, will use the search library-specific default algorithm.
`scikit-learn` possible values:
- random (default)
- grid
`scikit-optimize` possible values:
- bayesian (default)
`tune-sklearn` possible values:
- random (default)
- grid
- bayesian `pip install scikit-optimize`
- hyperopt `pip install hyperopt`
- bohb `pip install hpbandster ConfigSpace`
`optuna` possible values:
- tpe (default)
- random
- **early_stopping: bool or str or object, default = False**
Use early stopping to stop fitting to a hyperparameter configuration if it performs poorly. Ignored when ``search_library`` is scikit-learn, or if the estimator does not have 'partial_fit' attribute. If False or None, early stopping will not be used. Can be either an object accepted by the search library or one of the following:
- 'asha' for Asynchronous Successive Halving Algorithm
- 'hyperband' for Hyperband
- 'median' for Median Stopping Rule
- If False or None, early stopping will not be used.
- **early_stopping_max_iters: int, default = 10**
The maximum number of epochs to run for each sampled configuration. Ignored if `early_stopping` is False or None.
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
- **return_tuner: bool, default = False**
When set to True, will return a tuple of (model, tuner_object).
- **tuner_verbose: bool or in, default = True**
If True or above 0, will print messages from the tuner. Higher values print more messages. Ignored when ``verbose`` param is False.
ensemble_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
blend_models
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
- **weights: list, default = None**
Sequence of weights (float or int) to weight the occurrences of predicted class labels (hard voting) or class probabilities before averaging (soft voting). Uses uniform weights when None.
- The default value for the `method` parameter has been changed from `hard` to `auto`.
stack_models
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
calibrate_model
`pycaret.classification`
Following new parameters have been added:
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
plot_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fold: int or scikit-learn compatible CV generator, default = None**
Controls cross-validation. If None, the CV generator in the ``fold_strategy`` parameter of the ``setup`` function is used. When an integer is passed, it is interpreted as the 'n_splits' parameter of the CV generator in the ``setup`` function.
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
evaluate_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fold: int or scikit-learn compatible CV generator, default = None**
Controls cross-validation. If None, the CV generator in the ``fold_strategy`` parameter of the ``setup`` function is used. When an integer is passed, it is interpreted as the 'n_splits' parameter of the CV generator in the ``setup`` function.
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
finalize_model
`pycaret.classification` `pycaret.regression`
Following new parameters have been added:
- **fit_kwargs: Optional[dict] = None**
Dictionary of arguments passed to the fit method of the model.
- **groups: Optional[Union[str, Any]] = None**
Optional group labels when 'GroupKFold' is used for the cross-validation. It takes an array with shape (n_samples, ) where n_samples is the number of rows in the training dataset. When a string is passed, it is interpreted as the column name in the dataset containing group labels.
- **model_only: bool, default = True**
When set to False, only the model object is re-trained and all the transformations in Pipeline are ignored.
models
`pycaret.classification` `pycaret.regression` `pycaret.clustering` `pycaret.anomaly`
Following new parameters have been added:
- **internal: bool, default = False**
When True, will return extra columns and rows used internally.
- **raise_errors: bool, default = True**
When False, will suppress all exceptions, ignoring models that couldn't be created.
<br/><br/><br/>