Optimum

Latest version: v1.24.0

Safety actively analyzes 714792 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 15 of 23

1.7.0

New models supported in the ONNX export

Additional architectures are supported in the ONNX export: PoolFormer, Pegasus, Audio Spectrogram Transformer, Hubert, SEW, Speech2Text, UniSpeech, UniSpeech-SAT, Wav2Vec2, Wav2Vec2-Conformer, WavLM, Data2Vec Audio, MPNet, stable diffusion VAE encoder, vision encoder decoder, Nystromformer, Splinter, GPT NeoX.

* Add PoolFormer support in exporters.onnx by BakingBrains in https://github.com/huggingface/optimum/pull/646
* Support pegasus exporters by mht-sharma in https://github.com/huggingface/optimum/pull/620
* Audio models support with `optimum.exporters.onnx` by michaelbenayoun in https://github.com/huggingface/optimum/pull/622
* Add MPNet ONNX export by jplu in https://github.com/huggingface/optimum/pull/691
* Add stable diffusion VAE encoder export by echarlaix in https://github.com/huggingface/optimum/pull/705
* Add vision encoder decoder model in exporters by mht-sharma in https://github.com/huggingface/optimum/pull/588
* Nystromformer ONNX export by whr778 in https://github.com/huggingface/optimum/pull/728
* Support Splinter exporters (555) by Allanbeddouk in https://github.com/huggingface/optimum/pull/736
* Add gpt-neo-x support by sidthekidder in https://github.com/huggingface/optimum/pull/745

New models supported in BetterTransformer

A few additional architectures are supported in BetterTransformer: RoCBERT, RoFormer, Marian

* Add RoCBert support for Bettertransformer by shogohida in https://github.com/huggingface/optimum/pull/542
* Add better transformer support for RoFormer by manish-p-gupta in https://github.com/huggingface/optimum/pull/680
* added BetterTransformer support for Marian by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/808

Additional tasks supported in the ONNX Runtime integration

With ORTModelForMaskedLM, ORTModelForVision2Seq, ORTModelForAudioClassification, ORTModelForCTC, ORTModelForAudioXVector, ORTModelForAudioFrameClassification, ORTStableDiffusionPipeline.

Reference: https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort and https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/models#export-and-inference-of-stable-diffusion-models

* Add ORTModelForMaskedLM class by JingyaHuang in https://github.com/huggingface/optimum/pull/729
* Add ORTModelForVision2Seq for VisionEncoderDecoder models inference by mht-sharma in https://github.com/huggingface/optimum/pull/742
* Add ORTModelXXX for audio by mht-sharma in https://github.com/huggingface/optimum/pull/774
* Add stable diffusion onnx runtime pipeline by echarlaix in https://github.com/huggingface/optimum/pull/786

Support of the ONNX export from PyTorch on float16

In the ONNX export, it is possible to pass the options `--fp16 --device cuda` to export using float16 when a GPU is available, directly with the native [`torch.onnx.export`](https://pytorch.org/docs/stable/onnx.html#torch.onnx.export).

Example: `optimum-cli export onnx --model gpt2 --fp16 --device cuda gpt2_onnx/`

* Support ONNX export on `torch.float16` type by fxmarty in https://github.com/huggingface/optimum/pull/749

TFLite export

TFLite export is now supported, with static shapes:


optimum-cli export tflite --help
optimum-cli export tflite --model bert-base-uncased --sequence_length 128 bert_tflite/


* `exporters.tflite` initial support by michaelbenayoun in https://github.com/huggingface/optimum/pull/716
* TFLite auto-encoder models by michaelbenayoun in https://github.com/huggingface/optimum/pull/757
* [TFLite Export] Adds support for ResNet by sayakpaul in https://github.com/huggingface/optimum/pull/813

ONNX Runtime optimization and quantization directly in the CLI

* Add optimize and quantize command CLI by jplu in https://github.com/huggingface/optimum/pull/700
* Support ONNX Runtime optimizations in exporters.onnx by fxmarty in https://github.com/huggingface/optimum/pull/807

The ONNX export optionally supports the ONNX Runtime optimizations directly in the export, passing the `--optimize O1`, up to `--optimize O4` option:


optimum-cli export onnx --help
optimum-cli export onnx --model t5-small --optimize O3 t5small_onnx/


ONNX Runtime quantization is supported directly in command line, using `optimum-cli onnxruntime quantize`:


optimum-cli onnxruntime quantize --help
optimum-cli onnxruntime quantize --onnx_model distilbert_onnx --avx512


ONNX Runtime optimization is supported directly in command line, using `optimum-cli onnxruntime optimize`:


optimum-cli onnxruntime optimize --help
optimum-cli onnxruntime optimize --onnx_model distilbert_onnx -O3


ORTModelForCausalLM supports decoding with a single ONNX

Up no now, for decoders, two ONNX were used:
* One handling the first forward pass where no past key values have been cached yet - thus not taking them as input.
* One handling the following forward pass where past key values have been cached, thus taking them as input.

This release introduces the support in the ONNX export and in `ORTModelForCausalLM` of a single ONNX handling both steps of the decoding. This allows to **reduce memory usage**, as weights are not duplicated between two separate models during inference.

Using a single ONNX for decoders can be used by passing `use_merged=True` to `ORTModelForCausalLM.from_pretrained`, loading directly from a PyTorch model:

python
from optimum.onnxruntime import ORTModelForCausalLM

model = ORTModelForCausalLM.from_pretrained("gpt2", export=True, use_merged=True)


Alternatively, using a single ONNX for decoders is the default behavior in the ONNX export, that can later be used for example with `ORTModelForCausalLM`, the command `optimum-cli export onnx --model gpt2 gpt2_onnx/` will produce:


└── gpt2_onnx
   ├── config.json
   ├── decoder_model_merged.onnx
   ├── decoder_model.onnx
   ├── decoder_with_past_model.onnx
   ├── merges.txt
   ├── special_tokens_map.json
   ├── tokenizer_config.json
   ├── tokenizer.json
   └── vocab.json


The `decoder_model.onnx` and `decoder_with_past_model.onnx` are kept separate for backward compatibility, but during inference using solely `decoder_model_merged.onnx` is enough.

* Enable inference with a merged decoder in `ORTModelForCausalLM` by JingyaHuang in https://github.com/huggingface/optimum/pull/647

Single-file ORTModel accept numpy arrays

ORTModel accept numpy arrays as inputs, in addition to PyTorch tensors. This is only the case for models that use a single ONNX.

* Accept numpy.ndarray as input and output to ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/790

ORTOptimizer support for ORTModelForCausalLM

* ORTOptimizer support ORTModelForCausalLM by fxmarty in https://github.com/huggingface/optimum/pull/794
* Support IO Binding for merged decoder by fxmarty in https://github.com/huggingface/optimum/pull/797

Breaking changes

* In the ONNX export, exporting models in several ONNX (encoder, decoder) is now the default behavior: https://github.com/huggingface/optimum/pull/747. The old behavior is still accessible with `--monolith`.
* In decoders, reusing past key values is now the default in the ONNX export: https://github.com/huggingface/optimum/pull/748. The old behavior is still accessible by explicitly passing, for example, `--task causal-lm` instead of `--task causal-lm-with-past`.
* BigBird support in the ONNX export is removed, due to the `block_sparse` attention type being written in pure numpy in Transformers, and hence not exportable to ONNX: https://github.com/huggingface/optimum/pull/778
* The parameter `from_transformers` of `ORTModel.from_pretrained` will be deprecated in favor of `export`.

Bugfixes and improvements
* Fix disable shape inference for optimization by regisss in https://github.com/huggingface/optimum/pull/652
* Fix uninformative message when passing `use_cache=True` to ORTModel and no ONNX with cache is available by fxmarty in https://github.com/huggingface/optimum/pull/650
* Fix provider options when several providers are passed by fxmarty in https://github.com/huggingface/optimum/pull/653
* Add TensorRT engine to ONNX Runtime GPU documentation by fxmarty in https://github.com/huggingface/optimum/pull/657
* Improve documentation around ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/666
* minor updates on ONNX config guide by mszsorondo in https://github.com/huggingface/optimum/pull/662
* Fix FlaubertOnnxConfig by michaelbenayoun in https://github.com/huggingface/optimum/pull/669
* Use nvcr.io/nvidia/tensorrt image for GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/660
* Better Transformer doc fix by HamidShojanazeri in https://github.com/huggingface/optimum/pull/670
* Add support for LongT5 optimization using ORT transformer optimizer script by kunal-vaishnavi in https://github.com/huggingface/optimum/pull/683
* Add test for missing execution providers error messages by fxmarty in https://github.com/huggingface/optimum/pull/659
* ONNX transformation to cast int64 constants to int32 when possible by fxmarty in https://github.com/huggingface/optimum/pull/655
* Add missing normalized configs by fxmarty in https://github.com/huggingface/optimum/pull/694
* Remove code duplication in ORTModel's load_model by fxmarty in https://github.com/huggingface/optimum/pull/695
* Test more architectures in ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/675
* Avoid initializing unwanted attributes for ORTModel's having several inference sessions by fxmarty in https://github.com/huggingface/optimum/pull/696
* Fix the ORTQuantizer loading from specific file by echarlaix in https://github.com/huggingface/optimum/pull/701
* Add saving of diffusion model additional components for onnx export by echarlaix in https://github.com/huggingface/optimum/pull/699
* Fix whisper export by mht-sharma in https://github.com/huggingface/optimum/pull/629
* Support trust remote code option in ONNX export and ONNX Runtime integration by fxmarty in https://github.com/huggingface/optimum/pull/702
* Add nightly tests on dependencies dev versions by fxmarty in https://github.com/huggingface/optimum/pull/703
* Fix exception condition by mht-sharma in https://github.com/huggingface/optimum/pull/706
* Add ORTModelForMultipleChoice to the documentation by fxmarty in https://github.com/huggingface/optimum/pull/712
* Fix yaml format for dev tests by fxmarty in https://github.com/huggingface/optimum/pull/710
* Add ONNX Runtime training benchmark by JingyaHuang in https://github.com/huggingface/optimum/pull/592
* Allow `from optimum.onnxruntime import QuantizationConfig` by fxmarty in https://github.com/huggingface/optimum/pull/715
* Fix documentation for doctest tests to pass by fxmarty in https://github.com/huggingface/optimum/pull/713
* Use transformers>=4.26.0 in setup.py by fxmarty in https://github.com/huggingface/optimum/pull/723
* Fix GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/724
* Fix ONNX Runtime inference in `ORTTrainer` by JingyaHuang in https://github.com/huggingface/optimum/pull/709
* `onnxruntime/modeling_ort.py` refactor, part 1 by michaelbenayoun in https://github.com/huggingface/optimum/pull/698
* Update docker and doc of ORT Trainer by JingyaHuang in https://github.com/huggingface/optimum/pull/725
* Add test for code examples in the documentation and docstrings by fxmarty in https://github.com/huggingface/optimum/pull/704
* add image classification example to optimum by prathikr in https://github.com/huggingface/optimum/pull/711
* Add TensorrtExecutionProvider modeling tests by fxmarty in https://github.com/huggingface/optimum/pull/722
* Whisper shape inference fix by michaelbenayoun in https://github.com/huggingface/optimum/pull/726
* Add some redirections to Optimum Habana's documentation by regisss in https://github.com/huggingface/optimum/pull/735
* Patch `ORTTrainer` inference with ONNX Runtime backend by JingyaHuang in https://github.com/huggingface/optimum/pull/737
* Remove dead code in whisper ONNX output by fxmarty in https://github.com/huggingface/optimum/pull/741
* Unpin protobuf 3.20.1 by fxmarty in https://github.com/huggingface/optimum/pull/738
* Fix speech2text export by mht-sharma in https://github.com/huggingface/optimum/pull/746
* Raise error on double call to `BetterTransformer.transform()` by fxmarty in https://github.com/huggingface/optimum/pull/750
* `exporters.onnx` output names and dynamic axes fix by michaelbenayoun in https://github.com/huggingface/optimum/pull/731
* Fix NNCF supported quantization strategies README table by echarlaix in https://github.com/huggingface/optimum/pull/752
* Add GPU tests for BetterTransformer by fxmarty in https://github.com/huggingface/optimum/pull/751
* Fix doctest by fxmarty in https://github.com/huggingface/optimum/pull/759
* Fix ONNX Runtime cache usage for decoders, add relevant tests by fxmarty in https://github.com/huggingface/optimum/pull/756
* Fix GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/758
* Update quality tooling for formatting by regisss in https://github.com/huggingface/optimum/pull/760
* Fix wrong shapes used at ONNX export and validation by fxmarty in https://github.com/huggingface/optimum/pull/764
* Change type annotation by michaelbenayoun in https://github.com/huggingface/optimum/pull/768
* Fix stable diffusion ONNX export by echarlaix in https://github.com/huggingface/optimum/pull/762
* Disable ONNX Runtime provider check on Windows by fxmarty in https://github.com/huggingface/optimum/pull/771
* Fix FusionOptions following ORT 1.14 release by fxmarty in https://github.com/huggingface/optimum/pull/772
* Unpin numpy <1.24.0 by fxmarty in https://github.com/huggingface/optimum/pull/773
* Fix flaky ONNX Runtime generation test with past key value reuse by fxmarty in https://github.com/huggingface/optimum/pull/765
* Fix output shape dimension for OnnxConfigWithPast by fxmarty in https://github.com/huggingface/optimum/pull/780
* Fix used shapes, device at ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/777
* Pin numpy only for tensorflow export by fxmarty in https://github.com/huggingface/optimum/pull/781
* Fixed broken paper space links by Muhtasham in https://github.com/huggingface/optimum/pull/766
* Temporarily disable python 3.9 + macOS test due to onnxruntime 1.14 regression by fxmarty in https://github.com/huggingface/optimum/pull/783
* Update ORT Training to 1.14.0 by JingyaHuang in https://github.com/huggingface/optimum/pull/787
* Temporarily disable segformer TensorRT test by fxmarty in https://github.com/huggingface/optimum/pull/799
* Use a stateful ordered_input_names in ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/796
* Test ORTOptimizer with IO Binding by fxmarty in https://github.com/huggingface/optimum/pull/801
* [`BT`] Add stable layer-norm Wav2vec2 by younesbelkada in https://github.com/huggingface/optimum/pull/803
* Update rules for ruff by regisss in https://github.com/huggingface/optimum/pull/806
* Improve orttrainer test by JingyaHuang in https://github.com/huggingface/optimum/pull/779
* Fix ORT quantization for TensorRT documentation by fxmarty in https://github.com/huggingface/optimum/pull/812
* Fix GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/814
* Update ONNX Runtime training doc - use torchrun by JingyaHuang in https://github.com/huggingface/optimum/pull/820
* Fix ONNX export tests by fxmarty in https://github.com/huggingface/optimum/pull/822
* All back workflow dispatch on GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/823
* BetterTransformer pipeline padding issue fix by vrdn-23 in https://github.com/huggingface/optimum/pull/821
* Fix optimum pipeline initialization by fxmarty in https://github.com/huggingface/optimum/pull/824
* Fix failing GPU tests by fxmarty in https://github.com/huggingface/optimum/pull/829
* Remove feature dimension as dynamic axes for stable diffusion ONNX export by echarlaix in https://github.com/huggingface/optimum/pull/816
* Fix pipeline task dropping arguments bug by fxmarty in https://github.com/huggingface/optimum/pull/828
* Fix ORTQuantizer behavior with ORTModelForCausalLM by fxmarty in https://github.com/huggingface/optimum/pull/831
* Update tests by mht-sharma in https://github.com/huggingface/optimum/pull/826
* Fix exporters GPU CI by fxmarty in https://github.com/huggingface/optimum/pull/835
* Keep intermediary models for ONNX causal-lm by fxmarty in https://github.com/huggingface/optimum/pull/834
* Fix duplicate name merged decoder by fxmarty in https://github.com/huggingface/optimum/pull/837
* Apply lazy import for exporters by JingyaHuang in https://github.com/huggingface/optimum/pull/836

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.6.0...v1.7.0

1.6.4

Bugfix

* Fix past key/value reuse in decoders following transformers 4.26.0 release and renaming: https://github.com/huggingface/optimum/commit/b9211d6826b92700e73f48821d6e14bd08226abc
* ONNX Runtime 1.14 support: https://github.com/huggingface/optimum/pull/772

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.6.3...v1.6.4

1.6.3

Fixes `ORTTrainer` for the inference with the ONNX Runtime backend.

1.6.2

Hotfixes

* Support generation config in ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/651

Regressions

The export of speech-to-text architecture as a single ONNX file (that handles both the encoding and decoding) fails do to a regression with the latest transformers version: https://github.com/huggingface/optimum/issues/721

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.6.1...v1.6.2

1.6.1

Hotfixes
* Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by fxmarty in https://github.com/huggingface/optimum/pull/643
* Fix item access of some _TASKS_TO_AUTOMODELS by fxmarty in https://github.com/huggingface/optimum/pull/642


**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1

1.6.0

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:


optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/


* Add Optimum CLI backbone by fxmarty in https://github.com/huggingface/optimum/pull/593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the [diffusers](https://github.com/huggingface/diffusers) library:


optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/


* Add Stable Diffusion ONNX export by echarlaix in https://github.com/huggingface/optimum/pull/570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in [the documentation](https://huggingface.co/docs/optimum/main/en/bettertransformer/overview#supported-models).

* [BT] Add `Bettertransformer` support for FSMT by Sumanth077 in https://github.com/huggingface/optimum/pull/494
* [BT] add `BetterTransformer` support for ViLT architecture by ka00ri in https://github.com/huggingface/optimum/pull/508
* Add `MBart` support for `BetterTransformer` by ravenouse in https://github.com/huggingface/optimum/pull/516
* Add CLIP BetterTransformer by fxmarty in https://github.com/huggingface/optimum/pull/534
* Add BetterTransformer support for RemBERT by hchings in https://github.com/huggingface/optimum/pull/545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

* Add Swin support in exporters.onnx by fxmarty in https://github.com/huggingface/optimum/pull/528
* [`ONNX`] add `mobilenet` support by younesbelkada in https://github.com/huggingface/optimum/pull/633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the `generate()` method in transformers can now be exported in several files using the `--for-ort` argument:


optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx


yielding:

.
└── t5_small_onnx
   ├── config.json
   ├── decoder_model.onnx
   ├── decoder_with_past_model.onnx
   ├── encoder_model.onnx
   ├── special_tokens_map.json
   ├── spiece.model
   ├── tokenizer_config.json
   └── tokenizer.json


Passing `--for-ort`, exported models are expected to be loadable directly into [ORTModel](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel).

* Add ort export in exporters for encoder-decoder models by mht-sharma in https://github.com/huggingface/optimum/pull/497
* Support decoder generated with `--for-ort` from `optimum.exporters.onnx` in `ORTDecoder` by fxmarty in https://github.com/huggingface/optimum/pull/554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a `.onnx_data` file if necessary.

* Handling ONNX models with external data by NouamaneTazi in https://github.com/huggingface/optimum/pull/586
* Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by JingyaHuang in https://github.com/huggingface/optimum/pull/332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

* `ORTModel`, `ORTModelDecoder` and `ORTModelForConditionalGeneration` can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
* `ORTModel.from_pretrained()` with `from_transformers=True` now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
* `ORTQuantizer.save_pretrained()` now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
* `ORTOptimizer.save_pretrained()` now saves the preprocessor, making the exported directory usable end-to-end.

* ONNX Runtime integration API improvement by michaelbenayoun in https://github.com/huggingface/optimum/pull/515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: `optimum-cli export onnx --help`

* Support custom shapes for dummy inputs by fxmarty in https://github.com/huggingface/optimum/pull/522
* Support for custom input shapes in exporters onnx by fxmarty in https://github.com/huggingface/optimum/pull/575

Enable `use_cache=True` for ORTModelForCausalLM

Reusing past key values for models using [ORTModelForCausalLM](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForCausalLM) (e.g. gpt2) is now possible using `use_cache=True`, avoiding to recompute them at each iteration of the decoding:

python
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)


* Enable past_key_values for ORTModelForCausalLM by echarlaix in https://github.com/huggingface/optimum/pull/326

IO binding support for ORTModelForCustomTasks

[ORTModelForCustomTasks](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModelForCustomTasks) now supports [IO Binding](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu#reduce-memory-footprint-with-iobinding) when using CUDAExecutionProvider.

* Add IO binding support for custom ORTModel by JingyaHuang in https://github.com/huggingface/optimum/pull/447

Experimental support to merge ONNX decoder with/without past key values

Along with `--for-ort`, when passing `--task causal-lm-with-past `, `--task seq2seq-with-past` or `--task speech2seq-lm-with-past` during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:


optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/


python
import onnx
from optimum.onnx import merge_decoders

decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")

merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")


* Merge ONNX decoder models by JingyaHuang in https://github.com/huggingface/optimum/pull/587

Major bugs fixed

* Fix BetterTransformer with padding="max_length" by fxmarty in https://github.com/huggingface/optimum/pull/543
* Fix non-nesting bug in BetterTransformer integration by younesbelkada in https://github.com/huggingface/optimum/pull/637

Other changes, bugfixes and improvements
* Fix doc-builder premission error by mishig25 in https://github.com/huggingface/optimum/pull/482
* Fix doc build pr premissions by mishig25 in https://github.com/huggingface/optimum/pull/484
* Re-order the task manager doc by michaelbenayoun in https://github.com/huggingface/optimum/pull/483
* Fix whisper device for gpu test by fxmarty in https://github.com/huggingface/optimum/pull/486
* Fix tensorflow CI by fxmarty in https://github.com/huggingface/optimum/pull/489
* Fix PR doc generation by regisss in https://github.com/huggingface/optimum/pull/495
* Fix broken links in the doc by fxmarty in https://github.com/huggingface/optimum/pull/499
* Update iobinding ORT encoder whisper by mht-sharma in https://github.com/huggingface/optimum/pull/498
* fix NormalizedConfig init error message by PaulQbFeng in https://github.com/huggingface/optimum/pull/500
* Change import structure for ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/456
* [BT] Fix failing CI tests by younesbelkada in https://github.com/huggingface/optimum/pull/501
* Remove redundant condition statement in ORTDecoder(Seq2seq) by JingyaHuang in https://github.com/huggingface/optimum/pull/504
* [BT] put decorator on the correct place by younesbelkada in https://github.com/huggingface/optimum/pull/509
* [BT] clearer error message for `norm_first` by younesbelkada in https://github.com/huggingface/optimum/pull/510
* Deprecate PyTorch 1.12. for BetterTransformer by fxmarty in https://github.com/huggingface/optimum/pull/513
* Fix ORTModelForSeq2SeqLM test by fxmarty in https://github.com/huggingface/optimum/pull/455
* Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by fxmarty in https://github.com/huggingface/optimum/pull/514
* [BT] Fix doc bugs by younesbelkada in https://github.com/huggingface/optimum/pull/517
* Replace sklearn by scikit-learn by lesteve in https://github.com/huggingface/optimum/pull/502
* ORTModel uses optimum.exporters.onnx by michaelbenayoun in https://github.com/huggingface/optimum/pull/490
* Cleanup deprecated ONNX Runtime training docker files by JingyaHuang in https://github.com/huggingface/optimum/pull/523
* Added support for Tapas Model by JuheonChu in https://github.com/huggingface/optimum/pull/520
* Add benchmark results to gpu doc by JingyaHuang in https://github.com/huggingface/optimum/pull/525
* ORTModelForConditionalGeneration uses optimum.exporters.onnx by mht-sharma in https://github.com/huggingface/optimum/pull/529
* Better error message when wrong task is given to exporters by fxmarty in https://github.com/huggingface/optimum/pull/531
* Add OrtModelForSpeechSeq2Seq to doc by fxmarty in https://github.com/huggingface/optimum/pull/533
* Fold sections by default in the documentation's side-bar by regisss in https://github.com/huggingface/optimum/pull/535
* Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by regisss in https://github.com/huggingface/optimum/pull/536
* Add check_if_transformers_greater to manage different versions of transformers by regisss in https://github.com/huggingface/optimum/pull/537
* Enable to push some sections to the end of the TOC in the doc by regisss in https://github.com/huggingface/optimum/pull/532
* Fix import in ONNX export CLI by fxmarty in https://github.com/huggingface/optimum/pull/553
* Update readme by echarlaix in https://github.com/huggingface/optimum/pull/550
* Refactor of 2 functions used in ORTModel by michaelbenayoun in https://github.com/huggingface/optimum/pull/551
* Update readme by echarlaix in https://github.com/huggingface/optimum/pull/556
* Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by JingyaHuang in https://github.com/huggingface/optimum/pull/561
* Fix flaky BetterTransformer test by fxmarty in https://github.com/huggingface/optimum/pull/564
* enable FP16Optimizer for fp16 deepspeed training. by AdamLouly in https://github.com/huggingface/optimum/pull/547
* Update documentation quick tour section by echarlaix in https://github.com/huggingface/optimum/pull/574
* Move custom IOBinding to IOBindingHelper by JingyaHuang in https://github.com/huggingface/optimum/pull/571
* Add test for exporters.onnx CLI by fxmarty in https://github.com/huggingface/optimum/pull/573
* Documentation on quantization by michaelbenayoun in https://github.com/huggingface/optimum/pull/565
* More robust tests for ORTModel using decoders and use_cache=True by fxmarty in https://github.com/huggingface/optimum/pull/576
* Fix errors in onnxruntime modeling tests by fxmarty in https://github.com/huggingface/optimum/pull/585
* [BT] fix flaky test by younesbelkada in https://github.com/huggingface/optimum/pull/591
* Fix exporters onnx shapes by fxmarty in https://github.com/huggingface/optimum/pull/581
* Fix exporters.onnx tests by fxmarty in https://github.com/huggingface/optimum/pull/584
* Update on the ONNX Runtime documentation by michaelbenayoun in https://github.com/huggingface/optimum/pull/567
* Add the ORTModelForSemanticSegmentation class by TheoMrc in https://github.com/huggingface/optimum/pull/539
* Refactor BetterTransformer to be able to raise more informative error messages by fxmarty in https://github.com/huggingface/optimum/pull/594
* Constraint temprarily NumPy version to save CIs by JingyaHuang in https://github.com/huggingface/optimum/pull/614
* Add `encoder_last_hidden_state` as an output for encoder-decoder models by fxmarty in https://github.com/huggingface/optimum/pull/601
* Update dev version by fxmarty in https://github.com/huggingface/optimum/pull/617
* Fix documentation example by echarlaix in https://github.com/huggingface/optimum/pull/603
* Documentation improvements by fxmarty in https://github.com/huggingface/optimum/pull/598
* More informative message at ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/609
* Use optimum exporter for current weight sharing test by JingyaHuang in https://github.com/huggingface/optimum/pull/616
* OnnxConfig now handle the export to encoder / decoder / decoder_with_past themselves by michaelbenayoun in https://github.com/huggingface/optimum/pull/590
* Set explictly the device index by JingyaHuang in https://github.com/huggingface/optimum/pull/613
* Fix ORT GPU test by JingyaHuang in https://github.com/huggingface/optimum/pull/624
* Add GPT-J normalized config by fxmarty in https://github.com/huggingface/optimum/pull/623
* Remove diffusers dependency in onnxruntime code by fxmarty in https://github.com/huggingface/optimum/pull/619
* Use exporters in ORTTrainer by mht-sharma in https://github.com/huggingface/optimum/pull/546
* Improve `use_io_binding` default value for different execution providers by JingyaHuang in https://github.com/huggingface/optimum/pull/604
* fixed FuseBiasInLinear by specifying device by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
* Fixed GPU documentation for HF pipelines by smiraldr in https://github.com/huggingface/optimum/pull/602
* Add argument in the CLI to specify device to do the ONNX export on by fxmarty in https://github.com/huggingface/optimum/pull/634
* Allow kwargs in all generate_dummy_inputs() methods by fxmarty in https://github.com/huggingface/optimum/pull/638

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* TheoMrc
* Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
* ravenouse
* Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
* ka00ri
* Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
* Sumanth077
* Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494

Page 15 of 23

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.