Catboost

Latest version: v1.2.7

Safety actively analyzes 688365 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 15

0.20.2

Not secure
New features:
- String class labels are now supported for binary classification
- [CLI only] Timestamp column for the datasets can be provided in separate files.
- [CLI only] Timesplit feature evaluation.
- Process groups of any size in block processing.


Bug fixes:
- ``classes_count`` and ``class_weight`` params can be now used with user-defined loss functions. 1119
- Form correct metric descriptions on GPU if ``use_weights`` gets value by default. 1106
- Correct ``model.classes_`` attribute for binary classification (proper labels instead of always ``0`` and ``1``). 984
- Fix ``model.classes_`` attribute when classes_count parameter was specified.
- Proper error message when categorical features specified for MultiRMSE training. 1112
- Block processing: It is valid for all groups in a single block to have weights equal to 0
- fix empty asymmetric tree index calculation. 1104

0.20.1

Not secure
New features:
- Have `leaf_estimation_method=Exact` the default for MAPE loss
- Add `CatBoostClassifier.predict_log_proba()`, PR 1095

Bug fixes:
- Fix usability of read-only numpy arrays, 1101
- Fix python3 compatibility for `get_feature_importance`, PR 1090
- Fix loading model from snapshot for `boost_from_average` mode

0.20

Not secure
New submodule for text processing!
It contains two classes to help you make text features ready for training:
- [Tokenizer](https://github.com/catboost/catboost/blob/afb8331a638de280ba2aee3831ac9df631e254a0/library/text_processing/tokenizer/tokenizer.pxi#L77) -- use this class to split text into tokens (automatic lowercase and punctuation removal)
- [Dictionary](https://github.com/catboost/catboost/tree/master/library/text_processing/dictionary) -- with this class you create a dictionary which maps tokens to numeric identifiers. You then use these identifiers as new features.

New features:
- Enabled `boost_from_average` for `MAPE` loss function

Bug fixes:
- Fixed `Pool` creation from `pandas.DataFrame` with discontinuous columns, 1079
- Fixed `standalone_evaluator`, PR 1083

Speedups:
- Huge speedup of preprocessing in python-package for datasets with many samples (>10 mln)

0.19.1

Not secure
New features:
- With this release we support `Text` features for *classification on GPU*. To specify text columns use `text_features` parameter. Achieve better quality by using text information of your dataset. See more in [Learning CatBoost with text features](https://github.com/catboost/tutorials/blob/master/text_features/text_features_in_catboost.ipynb)
- `MultiRMSE` loss function is now available on CPU. Labels for the multi regression mode should be specified in separate `Label` columns
- MonoForest framework for model analysis, based on our NeurIPS 2019 [paper](https://papers.nips.cc/paper/9530-monoforest-framework-for-tree-ensemble-analysis). Learn more in [MonoForest tutorial](https://github.com/catboost/tutorials/tree/master/model_analysis/monoforest_tutorial.ipynb)
- `boost_from_average` is now `True` by default for `Quantile` and `MAE` loss functions, which improves the resulting quality

Speedups:
- Huge reduction of preprocessing time for datasets loaded from files and for datasets with many samples (> 10 million), which was a bottleneck for GPU training
- 3x speedup for small datasets

0.18.1

Not secure
New features:
- Now `datasets.msrank()` returns _full_ msrank dataset. Previously, it returned the first 10k samples.
We have added `msrank_10k()` dataset implementing the past behaviour.

Bug fixes:
- `get_object_importance()` now respects parameter `top_size`, 1045 by ibuda

0.18

Not secure
- The main feature of the release is huge speedup on small datasets. We now use MVS sampling for CPU regression and binary classification training by default, together with `Plain` boosting scheme for both small and large datasets. This change not only gives the huge speedup but also provides quality improvement!
- The `boost_from_average` parameter is available in `CatBoostClassifier` and `CatBoostRegressor`
- We have added new formats for describing monotonic constraints. For example, `"(1,0,0,-1)"` or `"0:1,3:-1"` or `"FeatureName0:1,FeatureName3:-1"` are all valid specifications. With Python and `params-file` json, lists and dictionaries can also be used

Bugs fixed:
- Error in `Multiclass` classifier training, 1040
- Unhandled exception when saving quantized pool, 1021
- Python 3.7: `RuntimeError` raised in `StagedPredictIterator`, 848

Page 6 of 15

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.