Optimum

Latest version: v1.24.0

Safety actively analyzes 714772 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 23

1.14.0

ONNX

New architectures

Falcon

* Add ONNX and ORT support for Falcon by fxmarty in https://github.com/huggingface/optimum/pull/1391

SpeechT5

* SpeechT5 ONNX support by fxmarty in https://github.com/huggingface/optimum/pull/1404

Mistral

* Add Mistral models ONNX export support by echarlaix in https://github.com/huggingface/optimum/pull/1425


TrOCR

* Enable KV cache support by fxmarty in https://github.com/huggingface/optimum/pull/1456


LCMs

Enable LCMs (available in in `diffusers` since [`v0.22.0`](https://github.com/huggingface/diffusers/releases/tag/v0.22.0)) ONNX export and ORT inference by echarlaix in https://github.com/huggingface/optimum/pull/1469

python
from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

bash
optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/




Decoder refactorization

* Add position ids as input during ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1381
* Enable the export of only one decoder for decoder-only models by echarlaix in https://github.com/huggingface/optimum/pull/1257


GPTQ

* Enable possibility to choose exllamav2 kernels for GPTQ models by SunMarc in https://github.com/huggingface/optimum/pull/1419
* Disable exllamav2 for quantization by SunMarc in https://github.com/huggingface/optimum/pull/1482
* Default to exllama when exllamav2 is disabled by SunMarc in https://github.com/huggingface/optimum/pull/1494
* Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by AlexKoff88 in https://github.com/huggingface/optimum/pull/1479
* Add support for CPU Inference by vivekkhandelwal1 in https://github.com/huggingface/optimum/pull/1496
* Fix minimum version of auto-gptq by fxmarty in https://github.com/huggingface/optimum/pull/1504
* switch to exllama_config instead of disabling exllamav2 by SunMarc in https://github.com/huggingface/optimum/pull/1505


Other changes and bugfixes

* Fix wrong dtype in the ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1369
* Add support for loading quantization from config by aarnphm https://github.com/huggingface/optimum/pull/1363
* Guard multiprocessing set start method by fxmarty in https://github.com/huggingface/optimum/pull/1377
* Do not output KV cache when not using `with-past` in the ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1358
* Fix provider availability check on ORT 1.16.0 release by fxmarty in https://github.com/huggingface/optimum/pull/1403
* Fix quantization for onnxruntime v1.16.0 by echarlaix in https://github.com/huggingface/optimum/pull/1405
* Fix normalized config key for models architecture by echarlaix in https://github.com/huggingface/optimum/pull/1408
* Fix arg in bettertransformer llama attention by SunMarc in https://github.com/huggingface/optimum/pull/1421
* Ignore .xml files for Stable Diffusion ORT downloads by baskrahmer in https://github.com/huggingface/optimum/pull/1428
* Falcon BetterTransformer requires transformers>=4.34 by fxmarty in https://github.com/huggingface/optimum/pull/1431
* Fix llama ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1432
* Update attention.py by DongHande in https://github.com/huggingface/optimum/pull/1416
* Remove SharedDDP as it was deprecated from Transformers by AdamLouly in https://github.com/huggingface/optimum/pull/1443
* Fix owlvit task detection by fxmarty in https://github.com/huggingface/optimum/pull/1453
* Improve ONNX quantization doc by fxmarty in https://github.com/huggingface/optimum/pull/1451
* Fix perceiver tests and dummy inputs for ONNX by baskrahmer in https://github.com/huggingface/optimum/pull/1449
* Disable bart onnx export for text-classification and question-answering by fxmarty in https://github.com/huggingface/optimum/pull/1457
* Fix ONNX exporter library_name by baskrahmer in https://github.com/huggingface/optimum/pull/1460
* [ORT Training] Some important updates of ONNX Runtime training APIs by JingyaHuang in https://github.com/huggingface/optimum/pull/1335
* Fix typo in BetterTransformer CLIP by fxmarty in https://github.com/huggingface/optimum/pull/1468
* Fix custom architecture detection in onnx export by fxmarty in https://github.com/huggingface/optimum/pull/1472
* Fix whisper export by mht-sharma in https://github.com/huggingface/optimum/pull/1503
* Update Transformers dependency for Habana extra by regisss in https://github.com/huggingface/optimum/pull/1508
* Fix argument error by ranchlai in https://github.com/huggingface/optimum/pull/1501
* Remove attention mask patching by fxmarty in https://github.com/huggingface/optimum/pull/1509
* Fix generation input by echarlaix in https://github.com/huggingface/optimum/pull/1512
* Fix tests ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/1517
* Fix BT on transformers 4.35 release by fxmarty in https://github.com/huggingface/optimum/pull/1518



New Contributors
* aarnphm made their first contribution in https://github.com/huggingface/optimum/pull/1363
* DongHande made their first contribution in https://github.com/huggingface/optimum/pull/1416
* AlexKoff88 made their first contribution in https://github.com/huggingface/optimum/pull/1479
* vivekkhandelwal1 made their first contribution in https://github.com/huggingface/optimum/pull/1496
* ranchlai made their first contribution in https://github.com/huggingface/optimum/pull/1501

1.13.3

Patch release for `transformers==4.34.1` compatibility. We will do a release next week for `transformers==4.35` compatibility and new features. Please bear with us!

* Falcon BetterTransformer requires transformers>=4.34 by fxmarty https://github.com/huggingface/optimum/pull/1431
* Fix arg in bettertransformer llama attention by SunMarc 1421
* Update Transformers dependency for Habana extra by regisss 1508
* temporarily pin to transformers<4.35 by fxmarty https://github.com/huggingface/optimum/commit/616931019b9bd7546918a48d475a07efb92f51b1

1.13.2

* Fix provider availability check on ORT 1.16.0 release by fxmarty in https://github.com/huggingface/optimum/pull/1403
* Fix ONNX Runtime quantization compatibility for onnxruntime v1.16.0 by echarlaix in https://github.com/huggingface/optimum/pull/1405

1.13.1

Fix ONNX fp16 export that broke in 1.13.0.

What's Changed
* Fix wrong dtype in the ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/1369
* Fix tests collection for TFLite export and trigger TFLite tests only when relevant by fxmarty in https://github.com/huggingface/optimum/pull/1368
* upgrade min compatible optimum-intel version by echarlaix in https://github.com/huggingface/optimum/pull/1371
* Fix fp16 ONNX export test by fxmarty in https://github.com/huggingface/optimum/pull/1373

1.13

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.

- Upgrade to SynapseAI 1.13 563 regisss

Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B

Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.

- Enable llama2-70b LoRA finetuning 527 mandy-li
- Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 487
- Enable Falcon 180B 537 hlahkar

Llama2 fp8 inference

- Add llamav2 fp8 inference 542 bgoldberg-habana

Mistral

- Add mistral support for generation 496 sywangyi

Optimizations

- Remove GPTJ dma before mha 468 BaihuiJin
- Enable llama attention softmax in bf16 521 schoi-habana
- Add load_meta_device option to reduce host RAM 529 jiminha
- Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) 532 puneeshkhanna
- Add hash_with_views arg for Falcon inference perf 534 schoi-habana
- Automate skip_hash_with_views for text generation with Falcon 544 regisss

Improved text generation

- Allow multi prompts 479 ssarkar2
- Growing bucket for beam 450 ssarkar2
- Some models have extra inputs, pad them too 488 ssarkar2
- Refactor run generation 523 bgoldberg-habana
- Fix setting of reuse cache 553 puneeshkhanna
- No need to unsqueeze input_id in prepare_inputs_for_generation 559 sywangyi
- Adding lm eval script 541 bgoldberg-habana

1.13.0

Deduplicate Embedding / LM head weight in the ONNX export

Workaround for a bug in the PyTorch ONNX export that does not deduplicate the Embedding and LM head shared weight: https://github.com/pytorch/pytorch/issues/108342. For small enough models, this results in up to 50% ONNX serialized model size decrease.

* Fix PyTorch tied weights being duplicated in the exported ONNX models by fxmarty in https://github.com/huggingface/optimum/pull/1326
* Fix initializer detection for weight deduplication by fxmarty in https://github.com/huggingface/optimum/pull/1333

Extended ONNX Runtime support

ONNX Runtime integration now supports Pix2Struct and MPT architectures. Donut now supports IO Binding. Encoder-Decoder models are now supported as well.

* Pix2Struct onnxruntime support by krathul in https://github.com/huggingface/optimum/pull/1296
* Add MPT onnx and ORT support by jiqing-feng in https://github.com/huggingface/optimum/pull/1161
* Donut iobinding by IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/1209
* Add encoder decoder model by mht-sharma in https://github.com/huggingface/optimum/pull/851

Extended ONNX export: MPT, TIMM models, Encoder-Decoder

Additionally, the model SAM is now be default exported as a vision_encoder.onnx, and prompt_encoder_mask_decoder.onnx.

* Add MPT onnx and ORT support by jiqing-feng in https://github.com/huggingface/optimum/pull/1161
* Adds ONNX Export Support for Timm Models by mht-sharma in https://github.com/huggingface/optimum/pull/965
* Add encoder decoder model by mht-sharma in https://github.com/huggingface/optimum/pull/851
* Fix SAM ONNX export requirements with transformers 4.32, export vision encoder separately by fxmarty in https://github.com/huggingface/optimum/pull/1301

BetterTransformer supports Falcon

* [`BetterTransformer`] Add falcon to `BetterTransformer` by younesbelkada in https://github.com/huggingface/optimum/pull/1343

Major bugfix: ability to set GPTQ Exllama kernel maximum length in the transformers integration

The function `exllama_set_max_input_length` from `auto-gptq` can now be used with Transformers GPTQ models.

* Version bump + add max_input_length to gptq by SunMarc in https://github.com/huggingface/optimum/pull/1329

Other changes and bugfixes
* Update version to 1.12.1.dev0 following release by fxmarty in https://github.com/huggingface/optimum/pull/1312

* Add GPTQ prefill benchmark by fxmarty in https://github.com/huggingface/optimum/pull/1313
* Precise ORTModel documentation by fxmarty in https://github.com/huggingface/optimum/pull/1268
* Improve BetterTransformer backward compatibility by fxmarty in https://github.com/huggingface/optimum/pull/1314
* Improve ORTModel documentation by fxmarty in https://github.com/huggingface/optimum/pull/1245
* Add bitsandbytes benchmark by fxmarty in https://github.com/huggingface/optimum/pull/1320
* fix typo in log message by AAnirudh07 in https://github.com/huggingface/optimum/pull/1322
* Support customize dtype for dummy generators by JingyaHuang in https://github.com/huggingface/optimum/pull/1307
* Fix opset custom onnx export by mht-sharma in https://github.com/huggingface/optimum/pull/1331
* Replace mpt to ernie custom export by mht-sharma in https://github.com/huggingface/optimum/pull/1332
* Fix BT benchmark script by fxmarty in https://github.com/huggingface/optimum/pull/1344
* Add name_or_path for donut generation by fxmarty in https://github.com/huggingface/optimum/pull/1345
* send both negative prompt embeds to ORT SDXL by ssube in https://github.com/huggingface/optimum/pull/1339
* add vae image processor by echarlaix in https://github.com/huggingface/optimum/pull/1219
* add negative prompt test by echarlaix in https://github.com/huggingface/optimum/pull/1347
* Add GPT BigCode to the BT documentation by fxmarty in https://github.com/huggingface/optimum/pull/1356
* Add BT dummy objects by fxmarty in https://github.com/huggingface/optimum/pull/1355
* Add text2text-generation-with-past test for encoder-decoder model by mht-sharma in https://github.com/huggingface/optimum/pull/1338
* Fix sentence transformer export by mht-sharma in https://github.com/huggingface/optimum/pull/1366

New Contributors
* krathul made their first contribution in https://github.com/huggingface/optimum/pull/1296
* AAnirudh07 made their first contribution in https://github.com/huggingface/optimum/pull/1322
* jiqing-feng made their first contribution in https://github.com/huggingface/optimum/pull/1161
* ssube made their first contribution in https://github.com/huggingface/optimum/pull/1339

**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.12.0...v1.13.0

Page 8 of 23

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.