Optimum

Latest version: v1.24.0

Safety actively analyzes 710445 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 17 of 23

1.4.1

* Add inference with `ORTModel` to `ORTTrainer` and `ORTSeq2SeqTrainer` 189
* Add `InferenceSession` options and provider to `ORTModel` 271
* Add mT5 (341) and Marian (393) support to `ORTOptimizer`
* Add batchnorm folding `torch.fx` transformations 348
* The `torch.fx` transformations now use the marking methods `mark_as_transformed`, `mark_as_restored`, `get_transformed_nodes` 385
* Update `BaseConfig` for `transformers` `4.22.0` release 386
* Update `ORTTrainer` for `transformers` `4.22.1` release 388
* Add extra ONNX Runtime quantization options 398
* Add possibility to pass `provider_options` to `ORTModel` 401
* Add support to pass a specific device for `ORTModel`, as `transformers` does for pipelines 427
* Fixes to support onnxruntime 1.13.1 430

1.4.0

ONNX Runtime

* Refactorization of `ORTQuantizer` (270) and `ORTOptimizer` (294)
* Add ONNX Runtime fused Adam Optimizer (295)
* Add `ORTModelForCustomTasks` allowing ONNX Runtime inference support for custom tasks (303)
* Add `ORTModelForMultipleChoice` allowing ONNX Runtime inference for models with multiple choice classification head (358)

Torch FX

* Add `FuseBiasInLinear` a transformation that fuses the weight and the bias of linear modules (253)


Improvements and bugfixes

* Enable the possibility to disregard the precomputed `past_key_values` during ONNX Runtime inference of Seq2Seq models (241)
* Enable node exclusion from quantization for benchmark suite (284)
* Enable possibility to use a token authentication when loading a calibration dataset (289)
* Fix optimum pipeline when no model is given (301)

1.3.1

* Adapt INC configuration and quantized model loading for transformers release 4.22 27
* Fix loss computation when distillation is activated while the weights corresponding to the distillation loss is set to 0 26

1.3.0

Torch FX
The `optimum.fx.optimization` module (232) provides a set of `torch.fx` graph transformations, along with classes and functions to write your own transformations and compose them.

- The `Transformation` and `ReversibleTransformation` represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
- The `compose` utility function enables transformation composition
- Two reversible transformations were added:
- `MergeLinears`: merges linear layers that have the same input
- `ChangeTrueDivToMulByInverse`: changes a division by a static value to a multiplication of its inverse

ORTModelForSeq2SeqLM

[`ORTModelForSeq2SeqLM`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForSeq2SeqLM) (199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.
* When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
* This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

python
from optimum.onnxruntime import ORTModelForSeq2SeqLM

Load model from hub and export it through the ONNX format
model = ORTModelForSeq2SeqLM.from_pretrained("t5-small", from_transformers=True)

Save the exported model in the given directory
model.save_pretrained(output_dir)

ORTModelForImageClassification

[`ORTModelForImageClassification`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForImageClassification) (226) allows ONNX Runtime inference for models with an image classification head.

Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

python
from optimum.onnxruntime import ORTModelForImageClassification

Load model from hub and export it through the ONNX format
model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224", from_transformers=True)

Save the exported model in the given directory
model.save_pretrained(output_dir)


ORTOptimizer

Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (`fp16`) to `OptimizationConfig` (273).

Pipelines

Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

* Image Classification ([ViT](https://huggingface.co/google/vit-base-patch16-224))
* Text-to-Text Generation ([T5 small](https://huggingface.co/t5-small))
* Summarization ([T5 base](https://huggingface.co/t5-base))
* Translation ([T5 base](https://huggingface.co/t5-base))

Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)

text = "What a beautiful day !"
pred = onnx_translation(text)
[{'translation_text': "C'est une belle journée !"}]


Breaking change
The [`ORTModelForXXX`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort) execution provider default value is now set to `CPUExecutionProvider` (#203). Before, if no execution provider was provided, it was set to `CUDAExecutionProvider` if a gpu was detected, or to `CPUExecutionProvider` otherwise.

1.2.3

* Remove intel sub-package, migrating to [`optimum-intel`](https://github.com/huggingface/optimum-intel) (#212)
* Fix the loading and saving of `ORTModel` optimized and quantized models (214)

1.2.2

* Extend `QuantizationPreprocessor` to dynamic quantization (https://github.com/huggingface/optimum/pull/196)
* Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
* Bump `huggingface_hub` version and `protobuf` fix (https://github.com/huggingface/optimum/pull/205)

Page 17 of 23

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.