We're happy to announce the AutoGluon 0.8 release.
NEW: [![](https://img.shields.io/discord/1043248669505368144?logo=discord&style=flat)](https://discord.gg/wjUmjqAc2N) Join our official community discord server to ask questions and get involved!
Note: Loading models trained in different versions of AutoGluon is not supported.
This release contains 196 commits from 20 contributors!
See the full commit change-log here: https://github.com/autogluon/autogluon/compare/0.7.0...0.8.0
Special thanks to geoalgo for the joint work in generating the experimental tabular Zeroshot-HPO portfolio this release!
Full Contributor List (ordered by of commits):
shchur, Innixma, yinweisu, gradientsky, FANGAreNotGnu, zhiqiangdon, gidler, liangfu, tonyhoo, cheungdaven, cnpgs, giswqs, suzhoum, yongxinw, isunli, jjaeyeon, xiaochenbin9527, yzhliu, jsharpna, sxjscience
AutoGluon 0.8 supports Python versions 3.8, 3.9, and 3.10.
Changes
Highlights
* AutoGluon TimeSeries introduced several major improvements, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
* AutoGluon Tabular now supports **[calibrating the decision threshold in binary classification](https://auto.gluon.ai/stable/tutorials/tabular/tabular-indepth.html#decision-threshold-calibration)** ([API](https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.calibrate_decision_threshold.html)), leading to massive improvements in metrics such as `f1` and `balanced_accuracy`. It is not uncommon to see `f1` scores improve from `0.70` to `0.73` as an example. We **strongly** encourage all users who are using these metrics to try out the new decision threshold calibration logic.
* AutoGluon MultiModal introduces two new features: 1) [**PDF document classification**](https://auto.gluon.ai/stable/tutorials/multimodal/document/pdf_classification.html), and 2) [**Open Vocabulary Object Detection**](https://auto.gluon.ai/stable/tutorials/multimodal/object_detection/quick_start/quick_start_ovd.html).
* AutoGluon MultiModal upgraded the presets for object detection, now offering `medium_quality`, `high_quality`, and `best_quality` options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
* AutoGluon Tabular has added an experimental **Zeroshot HPO config** which performs well on small datasets <10000 rows when at least an hour of training time is provided (~60% win-rate vs `best_quality`). To try it out, specify `presets="experimental_zeroshot_hpo_hybrid"` when calling `fit()`.
* AutoGluon EDA added support for [**Anomaly Detection**](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-anomaly-detection.html) and [**Partial Dependence Plots**](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-analyze-interaction.html#using-interaction-charts-to-learn-information-about-the-data).
* AutoGluon Tabular has added experimental support for **[TabPFN](https://github.com/automl/TabPFN)**, a pre-trained tabular transformer model. Try it out via `pip install autogluon.tabular[all,tabpfn]` (hyperparameter key is "TABPFN")! You can also try it out via specifying `presets="experimental_extreme_quality"`.
General
* General doc improvements tonyhoo Innixma yinweisu gidler cnpgs isunli giswqs (2940, 2953, 2963, 3007, 3027, 3059, 3068, 3083, 3128, 3129, 3130, 3147, 3174, 3187, 3256, 3258, 3280, 3306, 3307, 3311, 3313)
* General code fixes and improvements yinweisu Innixma (2921, 3078, 3113, 3140, 3206)
* CI improvements yinweisu gidler yzhliu liangfu gradientsky (2965, 3008, 3013, 3020, 3046, 3053, 3108, 3135, 3159, 3283, 3185)
* New AutoGluon Webpage gidler shchur (2924)
* Support sample_weight in RMSE jjaeyeon (3052)
* Move AG search space to common yinweisu (3192)
* Deprecation utils yinweisu (3206, 3209)
* Update namespace packages for PEP420 compatibility gradientsky (3228)
Multimodal
AutoGluon MultiModal (also known as AutoMM) introduces two new features: 1) PDF document classification, and 2) Open Vocabulary Object Detection. Additionally, we have upgraded the presets for object detection, now offering `medium_quality`, `high_quality`, and `best_quality` options. The empirical results demonstrate significant ~20% relative improvements in the mAP (mean Average Precision) metric, using the same preset.
New Features
* PDF Document Classification. See [tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/document/pdf_classification.html) cheungdaven (#2864, 3043)
* Open Vocabulary Object Detection. See [tutorial](https://auto.gluon.ai/stable/tutorials/multimodal/object_detection/quick_start/quick_start_ovd.html) FANGAreNotGnu (#3164)
Performance Improvements
* Upgrade the detection engine from mmdet 2.x to mmdet 3.x, and upgrade our presets FANGAreNotGnu (3262)
* `medium_quality`: yolo-s -> yolox-l
* `high_quality`: yolox-l -> DINO-Res50
* `best_quality`: yolox-x -> DINO-Swin_l
* Speedup fusion model training with deepspeed strategy. liangfu (2932)
* Enable detection backbone freezing to boost finetuning speed and save GPU usage FANGAreNotGnu (3220)
Other Enhancements
* Support passing data path to the fit() API zhiqiangdon (3006)
* Upgrade TIMM to the latest v0.9.* zhiqiangdon (3282)
* Support xywh output for object detection FANGAreNotGnu (2948)
* Fusion model inference acceleration with TensorRT liangfu (2836, 2987)
* Support customizing advanced image data augmentation. Users can pass a list of [torchvision transform](https://pytorch.org/vision/stable/transforms.html#geometry) objects as image augmentation. zhiqiangdon (3022)
* Add yoloxm and yoloxtiny FangAreNotGnu (3038)
* Add MultiImageMix Dataset for Object Detection FangAreNotGnu (3094)
* Support loading specific checkpoints. Users can load the intermediate checkpoints other than model.ckpt and last.ckpt. zhiqiangdon (3244)
* Add some predictor properties for model statistics zhiqiangdon (3289)
* `trainable_parameters` returns the number of trainable parameters.
* `total_parameters` returns the number of total parameters.
* `model_size` returns the model size measured by megabytes.
Bug Fixes / Code and Doc Improvements
* General bug fixes and improvements zhiqiangdon liangfu cheungdaven xiaochenbin9527 Innixma FANGAreNotGnu gradientsky yinweisu yongxinw (2939, 2989, 2983, 2998, 3001, 3004, 3006, 3025, 3026, 3048, 3055, 3064, 3070, 3081, 3090, 3103, 3106, 3119, 3155, 3158, 3167, 3180, 3188, 3222, 3261, 3266, 3277, 3279, 3261, 3267)
* General doc improvements suzhoum (3295, 3300)
* Remove clip from fusion models liangfu (2946)
* Refactor inferring problem type and output shape zhiqiangdon (3227)
* Log GPU info including GPU total memory, free memory, GPU card name, and CUDA version during training zhiqaingdon (3291)
Tabular
New Features
* Added `calibrate_decision_threshold` ([tutorial](https://auto.gluon.ai/stable/tutorials/tabular/tabular-indepth.html#decision-threshold-calibration)), which allows to optimize a given metric's decision threshold for predictions to strongly enhance the metric score. Innixma (3298)
* We've added an experimental Zeroshot HPO config, which performs well on small datasets <10000 rows when at least an hour of training time is provided. To try it out, specify `presets="experimental_zeroshot_hpo_hybrid"` when calling `fit()` Innixma geoalgo (3312)
* The [TabPFN model](https://auto.gluon.ai/stable/api/autogluon.tabular.models.html#tabpfnmodel) is now supported as an experimental model. TabPFN is a viable model option when inference speed is not a concern, and the number of rows of training data is less than 10,000. Try it out via `pip install autogluon.tabular[all,tabpfn]`! Innixma (3270)
* Backend support for distributed training, which will be available with the next Cloud module release. yinweisu (3054, 3110, 3115, 3131, 3142, 3179, 3216)
Performance Improvements
* Accelerate boolean preprocessing Innixma (2944)
Other Enhancements
* Add quantile regression support for CatBoost shchur (3165)
* Implement quantile regression for LGBModel shchur (3168)
* Log to file support yinweisu (3232)
* Add support for `included_model_types` yinweisu (3239)
* Add enable_categorical=True support to XGBoost Innixma (3286)
Bug Fixes / Code and Doc Improvements
* Cross-OS loading of a fit TabularPredictor should now work properly yinweisu Innixma
* General bug fixes and improvements Innixma cnpgs shchur yinweisu gradientsky (2865, 2936, 2990, 3045, 3060, 3069, 3148, 3182, 3199, 3226, 3257, 3259, 3268, 3269, 3287, 3288, 3285, 3293, 3294, 3302)
* Move interpretable logic to InterpretableTabularPredictor Innixma (2981)
* Enhance drop_duplicates, enable by default Innixma (3010)
* Refactor params_aux & memory checks Innixma (3033)
* Raise regression `pred_proba` Innixma (3240)
TimeSeries
In v0.8 we introduce several major improvements to the Time Series module, including new models, upgraded presets that lead to better forecast accuracy, and optimizations that speed up training & inference.
Highlights
- New models: `PatchTST` and `DLinear` from GluonTS, and `RecursiveTabular` based on integration with the [`mlforecast`](https://github.com/Nixtla/mlforecast) library shchur (#3177, 3184, 3230)
- Improved accuracy and reduced overall training time thanks to updated presets shchur (3281, 3120)
- 3-6x faster training and inference for `AutoARIMA`, `AutoETS`, `Theta`, `DirectTabular`, `WeightedEnsemble` models shchur (3062, 3214, 3252)
New Features
- Dramatically faster repeated calls to `predict()`, `leaderboard()` and `evaluate()` thanks to prediction caching shchur (3237)
- Reduce overfitting by using multiple validation windows with the `num_val_windows` argument to `fit()` shchur (3080)
- Exclude certain models from presets with the `excluded_model_types` argument to `fit()` shchur (3231)
- New method `refit_full()` that refits models on combined train and validation data shchur (3157)
- Train multiple configurations of the same model by providing lists in the `hyperparameters` argument shchur (3183)
- Time limit set by `time_limit` is now respected by all models shchur (3214)
Enhancements
- Improvements to the `DirectTabular` model (previously called `AutoGluonTabular`): faster featurization, trained as a quantile regression model if `eval_metric` is set to `"mean_wQuantileLoss"` shchur (2973, 3211)
- Use correct seasonal period when computing the MASE metric shchur (2970)
- Check the AutoGluon version when loading `TimeSeriesPredictor` from disk shchur (3233)
Minor Improvements / Documentation / Bug Fixes
* Update documentation and tutorials shchur (2960, 2964, 3296, 3297)
* General bug fixes and improvements shchur (2977, 3058, 3066, 3160, 3193, 3202, 3236, 3255, 3275, 3290)
Exploratory Data Analysis (EDA) tools
In 0.8 we introduce a few new tools to help with data exploration and feature engineering:
* **Anomaly Detection** gradientsky (3124, 3137) - helps to identify unusual patterns or behaviors in data that deviate significantly from the norm. It's best used when finding outliers, rare events, or suspicious activities that could indicate fraud, defects, or system failures. Check the [Anomaly Detection Tutorial](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-anomaly-detection.html) to explore the functionality.
* **Partial Dependence Plots** gradientsky (3071, 3079) - visualize the relationship between a feature and the model's output for each individual instance in the dataset. Two-way variant can visualize potential interactions between any two features. Please see this tutorial for more detail: [Using Interaction Charts To Learn Information About the Data](https://auto.gluon.ai/stable/tutorials/eda/eda-auto-analyze-interaction.html#using-interaction-charts-to-learn-information-about-the-data)
Bug Fixes / Code and Doc Improvements
* Switch regression analysis in `quick_fit` to use residuals plot gradientsky (3039)
* Added `explain_rows` method to `autogluon.eda.auto` - Kernel SHAP visualization gradientsky (3014)
* General improvements and fixes gradientsky (2991, 3056, 3102, 3107, 3138)