Catboost

Latest version: v1.2.7

Safety actively analyzes 688365 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 15

1.2

Not secure
Major changes
CatBoost's build system has been switched from Ya Make (Yandex's build system) to [CMake](https://cmake.org/). This means more transparency in the build process and more familiar tools for Open Source developers.
For now it is possible to build CatBoost for:
* Linux on x86-64 with or without CUDA
* Linux on aarch64 with or without CUDA
* macOS on x86-64 and arm64, including creating universal binaries
* Windows on x86-64 with or without CUDA
* Android (only model applier) on [All supported ABIs](https://developer.android.com/ndk/guides/abis).

This allowed us to prepare the Python package in the source distribution form (also known as `sdist`). 830

* `msvs` subdirectory with the Microsoft Visual Studio solution has been removed. Visual Studio solutions can be generated using CMake instead.
* `make` subdirectory with Makefiles has been removed. Use `CMake` + `ninja` (recommended) or `CMake` + `make` instead.

Python package
* Switch to the standard Python build and installation method that uses `setup.py` instead of the custom `mk_wheel.py` script. All common scenarios (`sdist`, `build`, `install`, editable `install`, `bdist_wheel`) are supported.
* Switch wheel platform tag on Linux from obsolete `manylinux1` to `manylinux2014`.
* The source distribution is now available on PyPI. 830
* Support Python 3.11. 2213
* Drop support for obsolete Python 3.6.
* Make wheels [PEP427](https://peps.python.org/pep-0427/)-compliant. #2165
* Fix wrong checksums in wheels that caused problems with poetry. 2331
* Improved performance due to caching TBB local executors. 2203
* Add `fixed_binary_splits` to the regressor, classifier, and ranker.
* Compatibility with pandas 2.0. 2320
* CatBoost widget is now compatible with ipywidgets 8.x. 2266

Rust package
* Support CUDA applier. 1925, thanks to getumen.
* Properly forward debug/release setting to native library build.
* Passing features: switch from `String` and `Vec` types for features to `AsRef` of slices to make code more generic
* Support text and embedding features.
* Support multidimensional output in predictions.

New features
* \[JVM applier\]: Support CUDA.
* \[Spark\]: Support Spark 3.4.x (if you want to use Spark with python 3.11 use this version).
* Static model applier library now works on Windows.
* Add `binary-classification-threshold` parameter to the CLI model applier.
* Support Multi-target regression with text features (but only Bag-of-Words features are generated for now). 2229
* Support `RMSEWithUncertainty` loss function on GPU.
* Support `MultiLogloss` and `MultiCrossEntropy` loss functions with numerical features on GPU.
* Support `MultiLogloss` loss function with text features on CPU and GPU. 1885
* Enable univariate metrics for models with uncertainty
* Add `Focal` loss (CPU-only for now). 1807, thanks to diditforlulz273.

Improvements
* Removed legacy dependency on Python 2 interpreter in the build process. 2297
* Calc metrics: Throw catboost exception if column index exceeds column count.
* Speedup `MultiLogloss` on CPU by 8% per tree (110K samples, 20 targets, 480 float features, 3 cat features, 16 cores CPU).
* Update .NET projects from obsolete .NET Core 2.1 to .NET Core 3.1.
* Code generation for new CUDA Compute Architectures 8.6, 8.9 and 9.0 is enabled by default (requires CUDA 11.8 to build from source).
* Check that evaluator implementation is available in `TFullModel::SetEvaluatorType` (it was possible to get a Segmentation fault when calling it for non-available implementstion). Add `TFullModel::GetSupportedEvaluatorTypes`.
* Cross Validation on GPU no longer requires `allow_write_files=True`.

Bugfixes
* \[Python-package\]: Clear model params before load_model. Fixes 2205.
* \[Python-package\]: Fix CatBoostRanker score computation. 2231
* \[Python-package\]: Fix `_get_embedding_feature_indices`. 2273
* \[Python-package\]: Fix `set_feature_names` with text or embedding features. 2090
* \[Python-package\]: pandas.Categorical.categories is not necessarily a numpy.ndarray. 1965
* \[Spark\]: Pass classpath in a file to avoid hitting cmdline length limits. 1842
* \[CUDA Applier\]: Apply scale and bias.
* \[CUDA Applier\]: Fix that `libs/model_interface applier` always produced an error in CUDA mode.
* Fix CUDA error 700 in pairwise ranking.
* Fix kernel registration for distributed training on GPU.
* Fix `floating point exception' on CPU for small datasets on GPU.
* Fix wrong log message 'There are invalid params and some of them will be ignored'. 2253
* Fix incorrect results and crashes for GPU applier on Nvidia Ampere - based GPUs.
* Fix 'CUDA error 9' in Multi-GPU training.
* Fix serialization of embedding features structures in the model.
* Fix GPU buffer overrun in distributed multi-classification training.
* Fix `catboost/cuda/cuda_util/sort.cpp:166: CUDA error 9` on Nvidia Ampere - based GPUs.
* Fix inf/nan parsing in dataset input files.
* Fix floating point exception for very small datasets on GPU.
* Fix: built static applier library lacked the part with 'global' objects. 2187
* Fix sum of models with categorical features with CTRs.
* Fix: model_interface/cmake_example failed build "‘runtime_error’ is not a member of ‘std’". 2324, thanks to Mandelag.
* Fix Segmentation fault in Cross Validation and hyperparameter search functions that use it on GPU.
* Fix Segmentation fault in `utils.eval_metrics` for groupwise metrics when group data has not been specified. 2343
* Fix errors when running Cross Validation repeatedly on GPU. 2221

1.1.1

Not secure
New features
* Support building for Linux on aarch64 from sources using CMake (no prebuilt binaries or PyPI packages yet). 1981
* [C/C++ applier] Support embedding features. 2172
* [C/C++ applier] Add `GetModelUsedFeaturesNames`. 2204
* [Python] Add text features to `utils.create_cd`. 2193
* [Spark] Full support for Apache Spark 3.3
* [Spark] Read/write PySpark's DataFrame-like API for Pool. 2030
* [Spark] Allow to specify trainingDriver and worker listening ports. 2181

Bugfixes
* Fix prediction dimension check for RMSEWithUncertainty and MultiQuantile. 2155
* [C/C++ applier] Fix segmentation fault in prediction for multiple objects for multiple dimension models.
* [JVM applier] Fix catboost-common dependency version in catboost-prediction (Fixes JVM applier on macOS). 2121
* [Python] Update for pandas 1.5.0: iteritems -> items (Fixes annoying deprecation warning). 2179
* [Python] Fix segmentation fault when target is `np.ndarray` with `dtype=object`. 2201
* [Python] Fix specifying `feature_names` in `utils.create_cd`. 2211

1.1

Not secure
New features
* Multiquantile regression

Now it's possible to train models with shared tree structure and multiple predicted quantile values in each leaf. Currently this approach doesn't give a strong guarantee for predicted quantile values consistency, but it still provides more consistency than training multiple independent models for each quantile. You can read [short description in the documentation](https://catboost.ai/en/docs/concepts/loss-functions-regression#MultiQuantile). Short example for Python: `loss_function='MultiQuantile:alpha=0.2,0.4'`. Supported only on CPU for now.
* Support text and embedding features for regression and ranking.
* Spark: Read/write Spark's Dataset-like API for Pool. 2030
* Support HashedCateg column type. This allows to use externally prehashed categorical features both in training and prediction.
* New option `plot_file` in Python functions with `plot` parameter allows to save plots to file. 758
* Add eval_fraction parameter. 1500
* Non-symmetric trees model summation.
* `init_model` parameter now works with non-symmetric trees.
* Partial support for Apache Spark 3.3 (only for Scala 2.12 and without PySpark).

Speedups
* 2x speedup DCG, nDCG and FilteredDCG metrics calculation for groups with >= 50 objects and with top=-1 (all objects from each group, default value)
* Fixed 2x slowdown of PairLogit and other ranking losses on CPU introduced in release 0.23

Bugfixes
* Fix for pandas integer array. 2096
* Save feature names to json format. 2102
* Fix feature weights on CPU
* Use feature weights on GPU
* Fix gradient calculation for QueryRMSE on GPU
* Fix ranking metrics with group weights in calc_metrics
* Fix JVM applier on data with text features. 2132

1.0.6

Not secure
New features

* Fixed splits for binary features on GPU for non-symmetric trees -- specify the set of splits to start each tree in the model with `--fixed-binary-splits` or `fixed_binary_splits` in Python package (by default, there are no fixed splits)


Documentation

* New sections on [MultiRMSEWithMissingValues](https://catboost.ai/en/docs/concepts/loss-functions-multiregression#MultiRMSEWithMissingValues)
and [LogCosh](https://catboost.ai/en/docs/concepts/loss-functions-regression#LogCosh)
* New section on [get_embedding_feature_indices](https://catboost.ai/en/docs/concepts/python-reference_pool_get_embedding_feature_indices)
* Add info on GPU support for metrics


Bug-fixes

* Fix warning about resetting logger when logging to sys.stdout & sys.stderr from different threads 1855
* Fix model summation in CatBoost for Apache Spark
* Fix performance and scalability of query auc for ranking (1m samples, query size 2, 8 CPU cores 0.55s -> 0.04s)
* Fix support for text features and embeddings in Java applier 2043
* Fix nan/inf split scores with yeti rank pairwise loss
* Fix nan/inf feature strengths in pair logit on CPU

1.0.5

Not secure
New features

* Support Apple Darwin arm64 architecture. 1526.
* Support feature tags in feature selection.
* Support for Apache Spark 3.2.
* Model sum in Apache Spark.

Python package

* Accommodate multiple target-platform arguments used to build universal binaries.
* Add grid creation function to utils.py
* Custom multilabel eval metrics by ELitvinova
* Metrics plotter by evgenabramov
* Fbeta score by ELitvinova

Bugfixes

* Fix group weights in metrics calculation.
* Fix `fit` for PySpark estimators. 1976.
* Fix predict on GPU. 1901, 1923.
* Disable exact leafs calculation for `MAE`, `MAPE`, `Quantile` on GPU.
* Fix counter description for plotting. 1973.
* Allow weights in `BrierScore`. 1967.
* Disable AUC calculation for learn by default on GPU as well.
* Fix `plot_tree` example in documentation.
* Fix plots in `cv`.
* Fix ui32 overflows in pairwise losses on GPU.
* Fix for multiclass in nodejs evaluator. 1903.
* Fix CatBoost R package installation on Monterey. 1912.
* Fix CUDA error 700 caused by data race in mimalloc and CUDA driver.
* Fix slow compilation with CUDA 11.2+.
* Fix 2nd derivative in RMSEWithUncertainty.

1.0.4

Not secure
New features
* Add `sort` param to `FilteredDCG` metric.
* Add `StochasticRank` for `FilteredDCG`.

Python package
* add is_max/minimizable methods. 1915
* Support custom metric in select_features 1920

R package
* Register functions from libcatboostr natively in R, removing one of CRAN notes.

Bugfixes
* Fix apply for models without main `loss_function`.
* Fix text calcer options specification. 1916
* Fix `calc_feature_statistics`
* Fix Multi-approx support in CLI `calc_metrics` mode.
* Fix processing for text options. 1930
* Fix snapshot saving in feature selection.
* Fix CatBoost models serialization inside pipeline models in PySpark. 1936

Page 2 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.