Optimum

Latest version: v1.24.0

Safety actively analyzes 714792 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 16 of 23

1.5.5

* Fix FP16 models output results for dynamic input shapes (https://github.com/huggingface/optimum-intel/pull/139)
* Modify OpenVINO required version (https://github.com/huggingface/optimum-intel/pull/141)
* Improves latency performance for Seq2Seq models inference using OpenVINO runtime (https://github.com/huggingface/optimum-intel/pull/131)

1.5.4

Fix the IPEX context-manager enabling inference mode, by returning the original model when IPEX cannot optimize it (https://github.com/huggingface/optimum-intel/pull/132)

1.5.3

* Fix `GenerationMixin` import for `transformers` version >= 4.25.0 (127)
* Modify temporarily the maximum required `transformers` version until fix in OpenVINO export of GPT2 model (120)

1.5.2

Constraint temporarily numpy<1.24.0 (614)

1.5.1

Deprecate PyTorch 1.12. for BetterTransformer with better error message (513)

1.5.0

BetterTransformer

Convert your model into its [PyTorch `BetterTransformer`](https://pytorch.org/blog/a-better-transformer-for-fast-transformer-encoder-inference/) format using a one liner with the new `BetterTransformer` integration for faster inference on CPU and GPU!

python
from optimum.bettertransformer import BetterTransformer

model = BetterTransformer.transform(model)

Check the full list of supported models in [the documentaiton](https://huggingface.co/docs/optimum/bettertransformer/overview), and check out the [Google Colab demo](https://colab.research.google.com/drive/1Lv2RCG_AT6bZNdlL1oDDNNiwBBuirwI-?usp=sharing).

Contributions

- `BetterTransformer` integration (423)
- ViT and Wav2Vec2 support (470)

ONNX Runtime IOBinding support

ORT models (except for `ORTModelForCustomTasks`) now support [IOBinding](https://huggingface.co/docs/optimum/onnxruntime/usage_guides/gpu#reduce-memory-footprint-with-iobinding) to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

By default, `use_io_binding` is set to `True` when using CUDA. You can turn off the IOBinding in case of any memory issue:

python
from optimum.onnxruntime import ORTModelForSeq2SeqLM

model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)


Contributions

- Add IOBinding support to ONNX Runtime module (421)

Optimum Exporters

`optimum.exporters` is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

The export can be done via the CLI:

bash
python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/


For more information, check the [documentation](https://huggingface.co/docs/optimum/exporters/overview).

Contributions

- `optimum.exporters` creation (403)
- Automatic task detection (445)

Whisper

- Whisper can be exported to ONNX using `optimum.exporters`.
- Whisper can also be exported and ran using `optimum.onnxruntime`, IO binding is also supported.

**Note**: For the now the export from `optimum.exporters` will not be usable by `ORTModelForSpeechSeq2Seq`. To be able to run inference, export Whisper directly using `ORTModelForSpeechSeq2Seq`. This will be solved in the next release.

Contributions

- Whisper support with `optimum.onnxruntime` and `optimum.exporters` (420)

Other contributions

- ONNX Runtime training now supports ORT 1.13.1 and `transformers` 4.23.1 (434)
- `ORTModel` can load models from subfolders in a similar fashion as in `transformers` (443)
- `ORTOptimizer` has been refactored, and a factory class has been added to create common `OptimizationConfig`s (457)
- Fixes and updates in the documentation (411, 432, 437, 441)
- Fixes IOBinding (454, 461)

Page 16 of 23

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.