Catboost

Latest version: v1.2.7

Safety actively analyzes 688365 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 15

0.13.1

Not secure
Changes:
- Fixed a bug in shap values that was introduced in v0.13

0.13

Not secure
Speedups:
- Impressive speedup of CPU training for datasets with predominantly binary features (up to 5-6x).
- Speedup prediction and shap values array casting on large pools (issue [684](https://github.com/catboost/catboost/issues/684)).

New features:
- We've introduced a new type of feature importances - `LossFunctionChange`.
This type of feature importances works well in all the modes, but is especially good for ranking. It is more expensive to calculate, thus we have not made it default. But you can look at it by selecting the type of feature importance.
- Now we support online statistics for categorical features in `QuerySoftMax` mode on GPU.
- We now support feature names in `cat_features`, PR [679](https://github.com/catboost/catboost/pull/679) by [infected-mushroom](https://github.com/infected-mushroom) - thanks a lot [infected-mushroom](https://github.com/infected-mushroom)!
- We've intoduced new sampling_type `MVS`, which speeds up CPU training if you use it.
- Added `classes_` attribute in python.
- Added support for input/output borders files in python package. Thank you [necnec](https://github.com/necnec) for your PR [#656](https://github.com/catboost/catboost/pull/656)!
- One more new option for working with categorical features is `ctr_target_border_count`.
This option can be used if your initial target values are not binary and you do regression or ranking. It is equal to 1 by default, but you can try increasing it.
- Added new option `sampling_unit` that allows to switch sampling from individual objects to entire groups.
- More strings are interpreted as missing values for numerical features (mostly similar to pandas' [read_csv](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html)).
- Allow `skip_train` property for loss functions in cv method. Contributed by GitHub user [RakitinDen](https://github.com/RakitinDen), PR [#662](https://github.com/catboost/catboost/pull/662), many thanks.
- We've improved classification mode on CPU, there will be less cases when the training diverges.
You can also try to experiment with new `leaf_estimation_backtracking` parameter.
- Added new compare method for visualization, PR [652](https://github.com/catboost/catboost/pull/652). Thanks [Drakon5999](https://github.com/Drakon5999) for your contribution!
- Implemented `__eq__` method for `CatBoost*` python classes (PR [654](https://github.com/catboost/catboost/pull/654)). Thanks [daskol](https://github.com/daskol) for your contribution!
- It is now possible to output evaluation results directly to `stdout` or `stderr` in command-line CatBoost in [`calc` mode](https://catboost.ai/docs/concepts/cli-reference_calc-model.html) by specifying `stream://stdout` or `stream://stderr` in `--output-path` parameter argument. (PR [#646](https://github.com/catboost/catboost/pull/646)). Thanks [towelenee](https://github.com/towelenee) for your contribution!
- New loss function - [Huber](https://en.wikipedia.org/wiki/Huber_loss). Can be used as both an objective and a metric for regression. (PR [#649](https://github.com/catboost/catboost/pull/649)). Thanks [atsky](https://github.com/atsky) for your contribution!

Changes:
- Changed defaults for `one_hot_max_size` training parameter for groupwise loss function training.
- `SampleId` is the new main name for former `DocId` column in input data format (`DocId` is still supported for compatibility). Contributed by GitHub user [daskol](https://github.com/daskol), PR [#655](https://github.com/catboost/catboost/pull/655), many thanks.
- Improved CLI interface for cross-validation: replaced `-X/-Y` options with `--cv`, PR [644](https://github.com/catboost/catboost/pull/644). Thanks [tswr](https://github.com/tswr) for your pr!
- `eval_metrics` : `eval_period` is now clipped by total number of trees in the specified interval. PR [653](https://github.com/catboost/catboost/pull/653). Thanks [AntPon](https://github.com/AntPon) for your contribution!

R package:
- Thanks to [ws171913](https://github.com/ws171913) we made necessary changes to prepare catboost for CRAN integration, PR [#715](https://github.com/catboost/catboost/pull/715). This is in progress now.
- R interface for cross-validation contributed by GitHub user [brsoyanvn](https://github.com/brsoyanvn), PR [#561](https://github.com/catboost/catboost/pull/561) -- many thanks [brsoyanvn](https://github.com/brsoyanvn)!

Educational materials:
- We've added new tutorial for [GPU training on Google Colaboratory](https://github.com/catboost/tutorials/blob/master/tools/google_colaboratory_cpu_vs_gpu_tutorial.ipynb).

We have also done a list of fixes and data check improvements.
Thanks [brazhenko](https://github.com/brazhenko), [Danyago98](https://github.com/Danyago98), [infected-mushroom](https://github.com/infected-mushroom) for your contributions.

0.12.2

Not secure
Changes:
* Fixed loading of `epsilon` dataset into memory
* Fixed multiclass learning on GPU for >255 classes
* Improved error handling
* Some other minor fixes

0.12.1.1

Not secure
Changes:
* Fixed Python compatibility issue in dataset downloading
* Added `sampling_type` parameter for `YetiRankPairwise` loss

0.12.1

Not secure
Changes:
* Support saving models in ONNX format (only for models without categorical features).
* Added new dataset to our `catboost.datasets()` -- dataset [epsilon](catboost/benchmarks/model_evaluation_speed), a large dense dataset for binary classification.
* Speedup of Python `cv` on GPU.
* Fixed creation of `Pool` from `pandas.DataFrame` with `pandas.Categorical` columns.

0.12.0

Not secure
Breaking changes:
* Class weights are now taken into account by `eval_metrics()`,
`get_feature_importance()`, and `get_object_importance()`.
In previous versions the weights were ignored.
* Parameter `random-strength` for pairwise training (`PairLogitPairwise`,
`QueryCrossEntropy`, `YetiRankPairwise`) is not supported anymore.
* Simultaneous use of `MultiClass` and `MultiClassOneVsAll` metrics is now
deprecated.

New functionality:
* `cv` method is now supported on GPU.
* String labels for classes are supported in Python.
In multiclassification the string class names are inferred from the data.
In binary classification for using string labels you should employ `class_names`
parameter and specify which class is negative (0) and which is positive (1).
You can also use `class_names` in multiclassification mode to pass all
possible class names to the fit function.
* Borders can now be saved and reused.
To save the feature quantization information obtained during training data
preprocessing into a text file use cli option `--output-borders-file`.
To use the borders for training use cli option `--input-borders-file`.
This functionanlity is now supported on CPU and GPU (it was GPU-only in previous versions).
File format for the borders is described [here](https://tech.yandex.com/catboost/doc/dg/concepts/input-data_custom-borders-docpage).
* CLI option `--eval-file` is now supported on GPU.

Quality improvement:
* Some cases in binary classification are fixed where training could diverge

Optimizations:
* A great speedup of the Python applier (10x)
* Reduced memory consumption in Python `cv` function (times fold count)

Benchmarks and tutorials:
* Added [speed benchmarks](catboost/benchmarks/gpu_vs_cpu_training_speed) for CPU and GPU on a variety of different datasets.
* Added [benchmarks](catboost/benchmarks/ranking) of different ranking modes. In [this tutorial](catboost/tutorials/ranking/ranking_tutorial.ipynb) we compare
different ranking modes in CatBoost, XGBoost and LightGBM.
* Added [tutorial](catboost/tutorials/apply_model/catboost4j_prediction_tutorial.ipynb) for applying model in Java.
* Added [benchmarks](catboost/benchmarks/shap_speed) of SHAP values calculation for CatBoost, XGBoost and LightGBM.
The benchmarks also contain explanation of complexity of this calculation
in all the libraries.

We also made a list of stability improvements
and stricter checks of input data and parameters.

And we are so grateful to our community members canorbal and neer201
for their contribution in this release. Thank you.

Page 10 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.