Catboost

Latest version: v1.2.7

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 15

1.2.7

Bugfixes
* \[R-package\]: Restore basic functionality.

Build & testing
* \[GPU\] Return configuration for multi-node GPU training with CMake-based build. See [documentation](https://catboost.ai/en/docs/installation/cli-installation-multi-node-installation).

1.2.6

Major changes
* CatBoost open source build, test and release infrastructure has been switched to GitHub actions. It is possible to run it if you fork CatBoost repository as well. See [the announcement](https://github.com/catboost/catboost/discussions/2708) for details.

Python package
* Adapt `numpy` dependency specification to prohibit `numpy >= 2.0` for now. 2671

New features
* User-defined metric GPU evaluation for task_type=GPU. Thanks to pnsemyon.
* GPU Custom objective support. Thanks to pnsemyon.
* \[C/C++ applier\]. `APT_MULTI_PROBABILITY` prediction type is now supported. 2639. Thanks to aivarasbaranauskas.
* `GroupQuantile` metric
* [Aggregated graph features](https://catboost.ai/en/docs/features/graph-aggregated-features)

Build & testing
* \[Windows\]: Visual Studio 2022 with MSVC toolset 14.29.30133 is now supported. 2302

Speedups
* \[GPU\]: Increase block size in `QueryCrossEntropy` (~3x faster on a100 for 6m samples, 350 features, query size near 1).

Improvements
* \[datasets\] Use mkstemp to replace deprecated mktemp. 2660. Thanks to fatmo666

Bugfixes
* \[C/C++ applier\]. Add missed `PredictSpecificClassFlat` to calcer.exports. 2715
* \[Linux\]. Restore readable backtraces
* \[GPU\] Make CUDA_MAX_THREADS_PER_SM cuda arch-specific
* \[JVM applier\]: Fixed bloating temp directory with copies of native libraries on Windows. 2622. Thanks to DKARAGODIN.
* Calculate F1, Precision, and Recall for all labels in multi-label classification
* Synchronize values of NCB::NModelEvaluation::EPredictionType and EApiPredictionType. 2643
* Fix sign of 2nd derivative for Tweedie loss
* Fix 'Can't find borders for feature ...' error when using text features on GPU. 2657
* Fix indexing of tokenized text features in model saver and dataset loader when some features are ignored
* Fix descent direction for Cox regression fix 2701
* Fix GetTreeNodeToLeaf in multidimensional case (fixes plot_tree for multidimensional approx with non-oblivious trees). 2668

1.2.5

New features
* \[Python-package\]: Support custom eval metrics on GPU. 1792. Thanks to pnsemyon.

Bugfixes
* \[Python-package\]: Check eval_period parameter validity for staged prediction. 2593
* \[Python-package\]: Fix _CustomLoggersStack.pop logic. 2620
* \[R-package\]: Fix Caret object: Inconsistent grid creation with documentation. 2609
* \[JVM applier\]: Fix issues with exposing undesired symbols in JNI shared libraries (including allocators) on macOS. 2606
* Fix training with embedding features on GPU. 2249, 2308, 2591
* Fix training with text features on GPU
* Use correct sample count in MultiRMSE on multiple GPUs. 2557
* Fix sign of 2nd order derivative in Huber loss
* Enable gradient walker for non-additive metrics
* Fixes for Cox objective: buffer overflow in derivatives calculation, derivatives summation, metric calculation, disable ordered boosting
* Fix text features data serialization in the model files

1.2.3

Python package
* Support Python 3.12. 2510
* \[Performance\]: Fix ineffective loops in Cython. Significant speedups (up to 3x) on dataset construction from data in C-order can be expected.
* \[Performance\]: Make features data initialization from C-order `numpy.ndarray`s with `float32` data type multithreaded. Significant speedups of 5x up to 10x (on CPUs with many cores) can be expected. 385, 2542
* Save training metrics into the model metadata. So `best_score_`, `evals_result_`, `best_iteration_` model attributes now work after model saving and loading. Can be removed by model metadata manipulation if needed. 1166
* \[Breaking change\]. Support a separate boolean target type, now `Class` predictions for models that have been trained with boolean targets will also be boolean instead of `True`, `False` strings as before. Such models will be incompatible with the previous versions of CatBoost appliers. If you want the old behavior convert your target to `False`, `True` strings before training. 1954
* Restrict `jupyterlab` version for setup to 3.x for now. Fixes 2530
* `utils.read_cd`: Support CD files with non-increasing column indices.
* Make `log_cout`, `log_cerr` specification consistent, avoid reset in recursive calls.
* Late-initialize default values for `log_cout`, `log_cerr`. 2195
* Add missing generated metrics: `Cox`, `PairLogitPairwise`, `UserPerObjMetric`, `SurvivalAft`.

New features
* Support boolean target/labels type during training in Python and Spark (in the latter case only when using `fit` with `Pool` arguments) and `Class` prediction in Python. 1954
* \[Spark\]: Support Spark 3.5.x.
* \[C/C++ applier\]. Add functions for getting indices of features of different types to C and C++ API. 2568. Thanks to nimusp.
* \[C/C++ applier\]. Add staged prediction functions to C API. 2584. Thanks to Mb-NextTime.
* \[JVM applier\]. Add loading CatBoostModel from a byte array to API. 2539
* \[Linux\] Support CgroupsV2 when computing default number of threads used in parallel computations. 2519. Thanks to elukey.
* \[CLI\] Support printing `Auxiliary` columns by name in evaluation result output. 1659
* Save training metrics into the model metadata. Can be removed by model metadata manipulation if needed. 1166

Build & testing
* \[Windows\]: Use `clang-cl` from Visual Studio 2022 for the build without CUDA (build with CUDA still uses standard Microsoft toolchain from Visual Studio 2019).
* \[macOS\]: Pass `os.version` to `conan` host settings to ensure version consistency.
* \[Linux aarch64\]: Set `-mno-outline-atomics` for modern versions of CLang and GCC to avoid unresolved symbols linking errors. 2527
* Added missing `CMakeLists` for unit tests for `util`. 2525

Bugfixes
* \[Performance\]: Fix performance regression that could slow down training on GPU by 50% on some datasets that had been introduced in release 1.2. Thanks to JeanPaulShapo.
* \[Python-package\]: Fix segfault on Pool(data=None). 2522
* \[Python-package\]: Fix Python exception in `Pool()` when `pairs_weight` is a numpy array. 1913
* \[Python-package\]: Fix segfault and other strange errors when specifying custom logger with `__call__` method. 2277
* \[Python-package\]: Fix returning complex params in hyperparameter search. 1741, 1833
* \[Python-package\]: Fix ignored exceptions for missed metrics descriptions on startup. This has not been visible to users but has been making debugging more difficult.
* \[Python-package\]: Fix misleading `Targets are required for YetiRank loss function.` error in Cross validation. 2083
* \[Python-package\]: Fix `Pool.get_label()` returns constant `True` for boolean labels. 2133
* \[Python-package\]: Copying models does not lose `best_score_`, `evals_result_`, `best_iteration_` attributes values anymore. 1793
* \[Spark\]: Fix hangs at the end of the training. 2151
* `Precision` metric default value in the absense of positive samples is changed to 0 and a warning is added
(similar to the behavior of `scikit-learn` implementation). 2422
* Fix ignoring embedding features
* Try to avoid hash collisions when computing group ids with datasets with a lot of groups (may occur in datasets with around a 10^9 samples).
* Fix Multiclass models export to C++ and Python code. 2549
* Fix dataset_statistics mode when no `Target` data is available.
* Fix `Error: can't proceed some features` error on GPU. 1024
* Fix `allow_const_label=True` for classification. 1933
* Add checking of approx and target dimensions for `SurvivalAft` objective/metric.
* Fix Focal loss derivatives sign. 2563

1.2.2

Bugfixes
* Fix Segmentation fault when using custom `eval_metric` in binary python packages of version 1.2.1 on PyPI. 2486
* Fix LossFunctionChange fstr with embedding features.
* Fix a segmentation fault in JVM applier when using embedding features on JVM 11+.
* Fix CTR data handling in model summation (especially for models with CTRs with multiple target quantizations).

1.2.1

New features
* Allow to optimize specific ranking loss functions with YetiRank and YetiRankPairwise by specifying `mode` parameter. See [Which Tricks are Important for Learning to Rank?](https://arxiv.org/abs/2204.01500) paper for details (this family of losses is called `YetiLoss` there). CPU-only for now.
* Add Kernel Gradient Boosting support (use `catboost.sample_gaussian_process` function). 2408, thanks to TakeOver. See [Gradient Boosting Performs Gaussian Process Inference](https://arxiv.org/abs/2206.05608) paper for details.
* LambdaMart loss: support new target metrics MRR, ERR and MAP.
* StochasticRank loss: support new target metrics ERR and MRR.
* Support MultiRMSE on GPU. 2264, 2390
* Load JSON model format in Java Client. 1627, thanks to timotta
* Implement exporting of Multiclass models to C++ and Python. 2283, thanks to antoninkriz

Improvements
* Speedup BM25 feature calcers 3x
* Use `int` instead of deprecated `numpy.int`. 2378
* Add `ModelCalcerWrapper::CalcFlatTransposed`, 2413 thanks to faucct
* Update dependencies to avoid known vulnerabilities

Bugfixes
* Fix __shfl_up_sync mask. 2339
* TFocalMetric negative values fix. 2386, thanks to diditforlulz273
* Focal loss: Use user-defined alpha and gamma
* Fix exception propagation: Rethrow exceptions caused by user's python code as C++ exceptions
* CatBoost trained with user defined objective was incompatible with ShapValues calculation
* Avoid nan's in Newton step calculation for RMSEWithUncertainty
* Fix score method for y with shape (N, 1). 2405
* Fix scalePosWeight support for Spark. 2470

Page 1 of 15

Releases

Has known vulnerabilities

Catboost

Page 1 of 15

1.2.7

1.2.6

1.2.5

1.2.3

1.2.2

1.2.1

Page 1 of 15

Links

Releases