- new autoML option for ML infill using CatBoost library
- requires installing CatBoost with
pip install catboost
- available by passing ML_cmnd as
ML_cmnd = {'autoML_type':'catboost'}
- uses early stopping by default for regression and no early stopping by default for classifier
- (to avoid classifier channel for error when all label samples are included in validation set)
- can turn on early stopping for classifier by passing
ML_cmnd = {'autoML_type':'catboost', 'MLinfill_cmnd' : {'catboost_classifier_fit' : {'eval_ratio' : }}}
- where is float between 0-1 to designate validation ratio (defaults to 0.15 for regressor)
- in general can pass parameters to model initialization and fit operation as
ML_cmnd = {'autoML_type':'catboost',
'MLinfill_cmnd' : {'catboost_classifier_model' : {'parameter1' : 'value' },
'catboost_classifier_fit' : {'parameter2' : 'value' },
'catboost_regressor_model' : {'parameter3' : 'value' },
'catboost_regressor_fit' : {'parameter4' : 'value' }}}
- in general, accuracy performance of autoML options are expected as AutoGluon > CatBoost > Random Forest
- in general, latency performance of autoML options are expected as Random Forest > CatBoost > AutoGluon
- in general, memory performance of autoML options are expected as Random Forest > CatBoost > AutoGluon
- and where Random Forest and Catboost are more portable than AutoGluon since don't require a local model repository saved to hard drive
- for now retaining random forest as the default