BetterTransformer
Convert your model into its [PyTorch `BetterTransformer`](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) format using a one liner with the new `BetterTransformer` integration for faster inference on CPU and GPU!
python
from optimum.bettertransformer import BetterTransformer
model = BetterTransformer.transform(model)
Check the full list of supported models in [the documentaiton](https://huggingface.co/docs/optimum/bettertransformer/overview), and check out the [Google Colab demo](https://colab.research.google.com/drive/1Lv2RCG_AT6bZNdlL1oDDNNiwBBuirwI-?usp=sharing).
Contributions
- `BetterTransformer` integration (423)
- ViT and Wav2Vec2 support (470)
ONNX Runtime IOBinding support
ORT models (except for `ORTModelForCustomTasks`) now support [IOBinding](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu#reduce-memory-footprint-with-iobinding) to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.
By default, `use_io_binding` is set to `True` when using CUDA. You can turn off the IOBinding in case of any memory issue:
python
from optimum.onnxruntime import ORTModelForSeq2SeqLM
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)
Contributions
- Add IOBinding support to ONNX Runtime module (421)
Optimum Exporters
`optimum.exporters` is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.
The export can be done via the CLI:
bash
python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/
For more information, check the [documentation](https://huggingface.co/docs/optimum/exporters/overview).
Contributions
- `optimum.exporters` creation (403)
- Automatic task detection (445)
Whisper
- Whisper can be exported to ONNX using `optimum.exporters`.
- Whisper can also be exported and ran using `optimum.onnxruntime`, IO binding is also supported.
**Note**: For the now the export from `optimum.exporters` will not be usable by `ORTModelForSpeechSeq2Seq`. To be able to run inference, export Whisper directly using `ORTModelForSpeechSeq2Seq`. This will be solved in the next release.
Contributions
- Whisper support with `optimum.onnxruntime` and `optimum.exporters` (420)
Other contributions
- ONNX Runtime training now supports ORT 1.13.1 and `transformers` 4.23.1 (434)
- `ORTModel` can load models from subfolders in a similar fashion as in `transformers` (443)
- `ORTOptimizer` has been refactored, and a factory class has been added to create common `OptimizationConfig`s (457)
- Fixes and updates in the documentation (411, 432, 437, 441)
- Fixes IOBinding (454, 461)