Catboost

Latest version: v1.2.7

Safety actively analyzes 688365 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 9 of 15

0.15.2

Not secure
Breaking changes:
- Function `get_feature_statistics` is replaced by `calc_feature_statistics`
- Scoring function `Correlation` is renamed to `Cosine`
- Parameter `efb_max_conflict_fraction` is renamed to `sparse_features_conflict_fraction`

New features:
- Models can be saved in PMML format now.
> **Note:** PMML does not have full categorical features support, so to have the model in PMML format for datasets with categorical features you need to use set `one_hot_max_size` parameter to some large value, so that all categorical features are one-hot encoded
- Feature names can be used to specify ignored features

Bug fixes, including:
- Fixed restarting of CV on GPU for datasets without categorical features
- Fixed learning continuation errors with changed dataset (PR 879) and with model loaded from file (884)
- Fixed NativeLib for JDK 9+ (PR 857)

0.15.1

Not secure
Bug fixes
- restored parameter `fstr_type` in Python and R interfaces

0.15

Not secure
Breaking changes
- cv is now stratified by default for `Logloss`, `MultiClass` and `MultiClassOneVsAll`.
- We have removed `border` parameter of `Logloss` metric. You need to use `target_border` as a separate training parameter now.
- `CatBoostClassifier` now runs `MultiClass` if more than 2 different values are present in training dataset labels.
- `model.best_score_["validation_0"]` is replaced with `model.best_score_["validation"]` if a single validation dataset is present.
- `get_object_importance` function parameter `ostr_type` is renamed to `type` in Python and R.

Model analysis
- Tree visualisation by [karina-usmanova](https://github.com/karina-usmanova).
- New feature analysis: plotting information about how a feature was used in the model by [alexrogozin12](https://github.com/alexrogozin12).
- Added `plot` parameter to `get_roc_curve`, `get_fpr_curve` and `get_fnr_curve` functions from `catboost.utils`.
- Supported prettified format for all types of feature importances.

New ways of doing predictions
- Rust applier by [shuternay](https://github.com/shuternay).
- DotNet applier by [17minutes](https://github.com/17minutes).
- One-hot encoding for categorical features in CatBoost CoreML model by Kseniya Valchuk and Ekaterina Pogodina.


New objectives
- Expectile Regression by [david-waterworth](https://github.com/david-waterworth).
- Huber loss by [atsky](https://github.com/atsky).

Speedups
- Speed up of shap values calculation for single object or for small number of objects by [Lokutrus](https://github.com/Lokutrus).
- Cheap preprocessing and no fighting of overfitting if there is little amount of iterations (since you will not overfit anyway).

New functionality
- Prediction of leaf indices.

New educational materials
- Rust tutorial by [shuternay](https://github.com/shuternay).
- C tutorial.
- Leaf indices.
- Tree visualisation tutorial by [karina-usmanova](https://github.com/karina-usmanova).
- Google Colab tutorial for regression in catboost by [col14m](https://github.com/col14m).

And a set of fixes for your issues.

0.14.2

Not secure
New features
- Add `has_header` parameter to [`CatboostEvaluation`](https://github.com/catboost/catboost/blob/2f35e0366c0bb6c1b44be89fda0a02fe12f84513/catboost/python-package/catboost/eval/catboost_evaluation.py#L30) class.

Breaking changes
- Change output feature indices separator (`:` to `;`) in the `CatboostEvaluation` class.

0.14.1

Not secure
Breaking changes
- Changed default value for `--counter-calc-method` option to `SkipTest`

New features:
- Add guid to trained models. You can access it in Python using [`get_metadata`](https://catboost.ai/docs/concepts/python-reference_catboost_metadata.html) function, for example `print catboost_model.get_metadata()['model_guid']`

Bug fixes and other changes:
- Compatibility with glibc 2.12
- Improved embedded documentation
- Improved warning and error messages

0.14.0

Not secure
New features:

- GPU training now supports several tree learning strategies, selectable with `grow_policy` parameter. Possible values:
- `SymmetricTree` -- The tree is built level by level until `max_depth` is reached. On each iteration, all leaves from the last tree level will be split with the same condition. The resulting tree structure will always be symmetric.
- `Depthwise` -- The tree is built level by level until `max_depth` is reached. On each iteration, all non-terminal leaves from the last tree level will be split. Each leaf is split by condition with the best loss improvement.
- `Lossguide` -- The tree is built leaf by leaf until `max_leaves` limit is reached. On each iteration, non-terminal leaf with best loss improvement will be split.
> **Note:** grow policies `Depthwise` and `Lossguide` currently support only training and prediction modes. They do not support model analysis (like feature importances and SHAP values) and saving to different model formats like CoreML, ONNX, and JSON.
- The new grow policies support several new parameters:
`max_leaves` -- Maximum leaf count in the resulting tree, default 31. Used only for `Lossguide` grow policy. __Warning:__ It is not recommended to set this parameter greater than 64, as this can significantly slow down training.
`min_data_in_leaf` -- Minimum number of training samples per leaf, default 1. CatBoost will not search for new splits in leaves with sample count less than `min_data_in_leaf`. This option is available for `Lossguide` and `Depthwise` grow policies only.
> **Note:** the new types of trees will be at least 10x slower in prediction than default symmetric trees.

- GPU training also supports several score functions, that might give your model a boost in quality. Use parameter `score_function` to experiment with them.

- Now you can use quantization with more than 255 borders and `one_hot_max_size` > 255 in CPU training.

New features in Python package:
- It is now possible to use `save_borders()` function to write borders to a file after training.
- Functions `predict`, `predict_proba`, `staged_predict`, and `staged_predict_proba` now support applying a model to a single object, in addition to usual data matrices.

Speedups:
- Impressive speedups for sparse datsets. Will depend on the dataset, but will be at least 2--3 times for sparse data.

Breaking changes:
- Python-package class attributes don't raise exceptions now. Attributes return `None` if not initialized.
- Starting from 0.13 we have new feature importances for ranking modes. The new algorithm for feature importances shows how much features contribute to the optimized loss function. They are also signed as opposed to feature importances for not ranking modes which are non negative. This importances are expensive to calculate, thus we decided to not calculate them by default during training starting from 0.14. You need to calculate them after training.

Page 9 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.