Improved memory management in the ONNX export
Lower memory usage during the ONNX export. This is especially useful to export large models, or on cuda device. Until PyTorch 2.1 release, we recommend to use PyTorch nightly in case memory issues are encountered, as two major bugs were fixed on PyTorch side: https://github.com/pytorch/pytorch/pull/101134 https://github.com/pytorch/pytorch/pull/101148
* Run validation of exported model in no_grad mode by fxmarty in https://github.com/huggingface/optimum/pull/1111
* Load model directly on cuda device for the ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1112
* Lower GPU memory requirements at ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1115
Extended ONNX export
The ONNX export now supports the sam, lilt, pix2struct, cvt and owlvit architectures.
* Sam ONNX export support by michaelbenayoun in https://github.com/huggingface/optimum/pull/1025
* Add onnx exporter for Lilt model by mariababich in https://github.com/huggingface/optimum/pull/1098
* Add pix2struct to ONNX support (v2) by arvisioncode in https://github.com/huggingface/optimum/pull/1034
* Add CvTONNX Config by rishabbala in https://github.com/huggingface/optimum/pull/1131
* Support document-question-answering ONNX export for vision-encoder-decoder by fxmarty in https://github.com/huggingface/optimum/pull/1110
* add owlvit by darwinharianto in https://github.com/huggingface/optimum/pull/1067
Support of custom ONNX configurations for export
The method [`main_export`](https://huggingface.co/docs/optimum/main/en/exporters/onnx/package_reference/export#optimum.exporters.onnx.main_export) now supports two arguments `model_kwargs` and `custom_onnx_configs` that allow for a more custom export for advanced users. [Reference](https://huggingface.co/docs/optimum/main/en/exporters/onnx/usage_guides/export_a_model#custom-export-of-transformers-models).
* [ONNX export] Ability to pass arbitrary kwargs, custom ONNX configs by fxmarty in https://github.com/huggingface/optimum/pull/1143
Extended BetterTransformer support
* Add blip-2 to bettertransformer by baskrahmer in https://github.com/huggingface/optimum/pull/1125
* Support llama bettertransformer by fxmarty in https://github.com/huggingface/optimum/pull/998
ONNX Runtime: use IO Binding by default for decoder models on CPUExecutionProvider
IO Binding is useful not only to avoid RAM/device memory copies, but also simply between numpy tensors and OrtValue. Thus, for autoregressive tasks we enable IO Binding as a default on CPUExecutionProvider as well, which may bring >10% speedup for large context lengths.
* Enable use_io_binding = True on CPU by yihonglyu in https://github.com/huggingface/optimum/pull/1087
ORTModelForSpeechSeq2Seq supported in ORTOptimizer
* added ORTModelForSpeechSeq2Seq support to optimizer by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1068
Major bugfixes
* Use mask for seq2seq ONNX decoder models by fxmarty in https://github.com/huggingface/optimum/pull/1076
What's Changed
* Fix protobuf max allowed size by fxmarty in https://github.com/huggingface/optimum/pull/988
* Add Whisper to ORT optimizer configuration by kunal-vaishnavi in https://github.com/huggingface/optimum/pull/986
* Fix sentence-similarity task in TasksManager by fxmarty in https://github.com/huggingface/optimum/pull/996
* Simplify auto task detection by fxmarty in https://github.com/huggingface/optimum/pull/997
* Fix merged decoder usage with fp16 by fxmarty in https://github.com/huggingface/optimum/pull/1006
* Fix past key value generator used for ONNX export validation for t5/mt5 by fxmarty in https://github.com/huggingface/optimum/pull/1007
* Fix typo for custom shapes passed at ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1008
* Fix _versions.yml upload in doc build by regisss in https://github.com/huggingface/optimum/pull/1003
* ORTQuantizer supports subgraphs by fxmarty in https://github.com/huggingface/optimum/pull/1009
* fix for huggingface_hub last release by echarlaix in https://github.com/huggingface/optimum/pull/1014
* Add links to documentation to README by echarlaix in https://github.com/huggingface/optimum/pull/1013
* Upate documentation by echarlaix in https://github.com/huggingface/optimum/pull/1011
* update optimum intel description by echarlaix in https://github.com/huggingface/optimum/pull/1015
* fix: ValueError offload_dir by orangetin in https://github.com/huggingface/optimum/pull/993
* Sentence transformers ONNX export fix by michaelbenayoun in https://github.com/huggingface/optimum/pull/1029
* Add OpenVINO notebooks by echarlaix in https://github.com/huggingface/optimum/pull/1030
* Fix task inference for sam by michaelbenayoun in https://github.com/huggingface/optimum/pull/1031
* fix typo by echarlaix in https://github.com/huggingface/optimum/pull/1033
* added types to new fields in `OptimizationConfig` by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1036
* Fix some typos in the quantization guide by dcferreira in https://github.com/huggingface/optimum/pull/1041
* Optional `attention_mask` in `ORTModelForxxx` by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1045
* ONNX SAM export - change `input_points` data type by michaelbenayoun in https://github.com/huggingface/optimum/pull/1048
* `masked-im` output name fix for transformers >= 4.29.0 by michaelbenayoun in https://github.com/huggingface/optimum/pull/1049
* remove torchvision requirement by BramVanroy in https://github.com/huggingface/optimum/pull/1052
* Update version by regisss in https://github.com/huggingface/optimum/pull/1058
* Bump package version by regisss in https://github.com/huggingface/optimum/pull/1062
* Raise MinimumVersionError when OnnxConfig.MIN_TORCH_VERSION is not satisfied by regisss in https://github.com/huggingface/optimum/pull/1070
* Remove deprecated argument from tests and examples by echarlaix in https://github.com/huggingface/optimum/pull/1072
* Detect model type for all transformers models in TasksManager by fxmarty in https://github.com/huggingface/optimum/pull/1075
* Fix HF Push to hub by JingyaHuang in https://github.com/huggingface/optimum/pull/1080
* Fix float16 ORT conversion for models > 2GB by fxmarty in https://github.com/huggingface/optimum/pull/1079
* Update doc workflows by regisss in https://github.com/huggingface/optimum/pull/1093
* Error out on `ORTQuantizer.quantize` call for static quantization when no calibration range is provided by fxmarty in https://github.com/huggingface/optimum/pull/1094
* Add mpt model_type to NormalizedTextConfig by changwangss in https://github.com/huggingface/optimum/pull/1101
* Fix doc build by regisss in https://github.com/huggingface/optimum/pull/1107
* Improve the offline support for the ONNX/TFLite export by fxmarty in https://github.com/huggingface/optimum/pull/1109
* Add ViT to ORTConfigManager by baskrahmer in https://github.com/huggingface/optimum/pull/1117
* Fix TasksManager get_model_from_task with None device by fxmarty in https://github.com/huggingface/optimum/pull/1122
* Small typos by baskrahmer in https://github.com/huggingface/optimum/pull/1124
* Refactor BetterTransformerManager requirement validation methods by baskrahmer in https://github.com/huggingface/optimum/pull/1132
* update the default block size by rui-ren in https://github.com/huggingface/optimum/pull/1137
* Update ORT training docker to 1.15 by JingyaHuang in https://github.com/huggingface/optimum/pull/1139
* Adamlouly/fix unwrap model eval by AdamLouly in https://github.com/huggingface/optimum/pull/1099
* Remove version pinning for onnx package by cody-moveworks in https://github.com/huggingface/optimum/pull/1141
New Contributors
* orangetin made their first contribution in https://github.com/huggingface/optimum/pull/993
* dcferreira made their first contribution in https://github.com/huggingface/optimum/pull/1041
* BramVanroy made their first contribution in https://github.com/huggingface/optimum/pull/1052
* darwinharianto made their first contribution in https://github.com/huggingface/optimum/pull/1067
* mariababich made their first contribution in https://github.com/huggingface/optimum/pull/1098
* changwangss made their first contribution in https://github.com/huggingface/optimum/pull/1101
* arvisioncode made their first contribution in https://github.com/huggingface/optimum/pull/1034
* yihonglyu made their first contribution in https://github.com/huggingface/optimum/pull/1087
* rui-ren made their first contribution in https://github.com/huggingface/optimum/pull/1137
* cody-moveworks made their first contribution in https://github.com/huggingface/optimum/pull/1141
* rishabbala made their first contribution in https://github.com/huggingface/optimum/pull/1131
**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.8.0...v1.9.0