Optimum

Latest version: v1.24.0

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 19 of 23

1.0.0

ONNX Runtime support

* An `ORTConfig` class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
* The `ORTOptimizer` class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of `ORTOptimizer`, the user needs to provide an `ORTConfig` object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the `ORTOptimizer.fit` method.
* ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added `ORTQuantizer` class. In order to create an instance of `ORTQuantizer`, the user needs to provide an `ORTConfig` object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the `ORTQuantizer.fit` method.

Additionnal features for Intel Neural Compressor

We have also added a new class called `IncOptimizer` which will take care of combining the pruning and the quantization processes.

0.29

- Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 1150 dsocek

Stable Diffusion 3

- Sd3 1153 dsocek
- Refactor SD3 1199 dsocek

Training with Sentence Transformers

- Enable Sentence Transformer Trainer with Gaudi 1111 ZhengHongming888

Model optimizations

- Fix starcoder2 accuracy issue and optimize performance with fused rope 1095 mandy-li
- Enable FusedRoPE using float32 for gpt-neox model 1104 yeonsily
- Mamba initial enablement. 1122 libinta
- Adding fused qkv support along with config 1102 bhargaveede
- Enhance Qwen2 with fastsoftmax and bf16 RoPE and cache optimization 1087 Zhiwei35
- Enable fp8 inference for Llava-Next and add Fused_SDPA 1120 tthakkal
- Support bucket_internal for MPT 1137 pk1d3v
- Enable Flash Attention (Fused SDPA) for Starcoder 1114 abhilash1910
- gpt_bigcode: added FusedSDPA kernel 1138 mgonchar
- Enable torch.compile for Granite20B 1185 dvarshney-habana
- Refine use cache for mpt model 1158 Jing1Ling
- GPT-J support reuse_cache 1094 atakaha
- Use fast softmax only on prefill 1159 jaygala223
- Starcoder2 : KVCache and flash attention (FusedSDPA) enablement 1149 abhatkal
- Gpt bigcode fused sdpa 1260 yeonsily

SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

- Add an example of Segment Anything Model [Inference] 814 cfgfung
- Add an example of FastViT model (Infernece) 826 cfgfung
- VideoMAE Model Enabling and Examples 922 pi314ever
- OpenCLIP sample for visual question answering 977 vidyasiv
- Enabled DETR (Object Detection) model 1046 cfgfung
- Table transformer enabling 978 pi314ever
- deciLM support 1133 sywangyi

Stable Diffusion inpainting, unconditional image generation

- Add the Stable diffusion inpaint support 869 yuanwu2017
- Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] 859 cfgfung

Text feature extraction example

- Feature extraction enabling 994 pi314ever

Tensor parallelism

- Tensor parallel distributed strategy without using deepspeed 1121 kalyanjk
- Disable torch.compile for all_reduce when parallel_strategy is set to "tp" 1174 kalyanjk

Kubernetes cluster example

- Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster 1099 dmsuehir
- Fix PyTorch version in the Kubernetes docker-compose to match image 1246 dmsuehir

FP8 training

- TE FP8 integration 1096 SanjuCSudhakaran

Other

- Updates run_lora_clm.py with enhanced dataset support 955 dmsuehir
- Fix prefix tuning finetune issue and update test 975 sywangyi
- Fix throughput calculation in image-to-text example 1070 regisss
- SDXL-trainig: fixed ci, changed gated dataset, fixes for non-square datasets 1038 imangohari1
- Updating batch_size of Albert-XXL in README 1063 vineethanandh
- Fix the error of running run_pipeline.py of text_generation example 1055 yuanwu2017
- Add a test for llama finetuning with FP8 precision 1106 SanjuCSudhakaran
- Beam-search fix 1113 ssarkar2
- Add chat format support dataset in SFT 1066 libinta
- Fix nan loss of gemma and crash if dataset_concatenation is not set 1088 sywangyi
- torch.compile keep input mutation in graph this avoids unnecessary memcpy 1069 sushildubey171
- Updated langchain text-generation pipeline to work with latest release 0.2.5 1084 rbrugaro
- Add the MC example 891 yuanwu2017
- Fix recompiles if limit_hpu_graph is False 1129 ssarkar2
- Update examples batchsize in README 1123 shepark
- Fix OOM error in SDXL Fine-Tuning validation stage 1134 dsocek
- Added an example code to demonstrate how to use deterministic image generation 878 cfgfung
- SD image variation/InstructPix2Pix/StableDiffusionXLImg2ImgPipeline pipeline 988 sywangyi
- Add ci test for trl rewarding and ppo, fix backward failure in ppo caused by rmsfusion 1020 sywangyi
- Llama adapter 983 sywangyi
- torch.flip issue is fixed in SynapseAI 1.16, so remove the WA 1092 sywangyi
- Fix test CausalLanguageModelingLORAExampleTester KeyError 1139 dmsuehir
- fix(ci): new runs-on 1136 XciD
- Add trust_remote_code for loading datasets in the audio classification example 1074 regisss
- Generation example: print number of warmup iterations 1145 mgonchar
- CI Updates: text-gen to recieve ranks/bs, Updated bs/metric for baselines 1140 imangohari1
- Support for custom files for run_lora_clm.py 1039 vidyasiv
- Change the device_id for FSDP plugin 1086 ckvermaAI
- Set KV Cache update as static method 1160 ulivne
- To fix CPU tensor issue 1157 mkumargarg
- Adding missing __init__.py to mistral and mixtral test package 1188 rkumar2patel
- Add example of multitask_prompt/poly tuning 915 sywangyi
- Fix data-type mismatch for mlperf_inference accuracy test 1146 kalyanjk
- Fix spawn MP context, limit cpu and download data 1131 polisettyvarma
- T5 multi card 1222 yafshar
- Add trust_remote_code for t5 poly-tuning test 1220 yafshar
- Resolve "empty tensor optional" error with hpu_graphs + kv cache for StarCoder 1181 vidyasiv
- Fix VIT, add wav2vec comment 1223 ssarkar2
- Roberta tests were running on CPU 1229 ssarkar2
- Fix bert/roberta contrastive search tests 1226 skavulya
- Remove the default env variable to trust remote code by default 1225 yafshar
- Improve style check workflow 1230 regisss
- Added scheduler selection for SDXL fine-tuning 867 kplau1128
- Clear help msg for ignore_eos to avoid misunderstanding sywangyi
- Support loading hugging face checkpoint 1165 ulivne
- Change triggering event for code style check 1238 regisss
- gptj: fix missing token_idx 1234 envsp
- fix(nltk): fixed the version to working one 1247 imangohari1
- Updating to avoid hardcoding tests in CI framework 1221 vidyasiv
- Fix FSDP graph error due to Tranformer 4.43 update 1251 jiminha
- Fix SD README commands 1250 imangohari1
- Fix spelling errors 1252 changwangss
- Set HLS_MODULE_ID only if it wasn't set previously 1254 astachowiczhabana
- Fix overflow of steps in SDXL for default diffusers scheduler dsocek
- fix(test_diffusers): automated the checking for tests without upstream HF 1232 imangohari1
- fix(nltk): Revert 1247. Updated the version. added the punkt_tab download 1258 imangohari1
- Set input_embeds before it gets used 1261 tthakkal
- Update README and more changes, rebase to main 1259 shepark

Known limitations

- For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

0.23

This version has been validated for Transformers v4.34 and Diffusers v0.23.

- Upgrade to Transformers 4.34 475 regisss
- Upgrade to Diffusers 0.23 516 regisss
- Pin Diffusers 565 regisss

TGI

- Add link to TGI license 517 regisss
- Tgi Sharded feature 485 libinta

Dynamic shape support

- Add infra to enable/disable dynamic shapes feature through gaudi_config 513 vivekgoe

Habana Mixed Precision was removed in favor of Torch Autocast

- Remove HMP from optimum-habana 349 jwieczorekhabana

Various fixes

- Fix for SegFault during FT 483 MohitIntel
- Enable/disable gradient_checkpointing as per training_args.gradient_checkpointing value 484 vivekgoe
- Fix split validation dataset problem 489 mandy-li
- Fix validate dataset problem for openassistant-guanaco 498 mandy-li
- Fix for Accelerate 500 regisss
- Fix deepspeed init issue when using external launcher 497 yuanwu2017
- Update Transformers dependency in setup.py 504 regisss
- Fix token transmission in text-generation example 509 regisss
- Merge LoRA model before initializing DS inference in text-generation example 515 regisss
- Fix for Falcon-40b inference with deepspeed 502 schoi-habana
- Fixing FusedSDPA recompute bug 512 skaulintel
- Fixing update method - avoid copy idx to cpu which splitting the graph 524 bgoldberg-habana
- Fix missing max_position_embeddings in model config in run_clm.py 530 regisss
- Fix for attn_softmax_bf16 when generation_config is None 531 schoi-habana
- Fix loading on meta device for PEFT models with DS-inference 528 regisss
- Fix split by whitespaces not a single space 540 oelayan7
- Fix stable diffusion pipelines 548 regisss
- Update trainer.py 549 skaulintel
- Add fallback for PEFT when the base model doesn't exist 557 regisss

Others

- Update GaudiNIC multi-node-training dockerfile and setup 477 yeonsily
- Adding ignore_eos flag to use in generation 469 bhargaveede
- Add maximum hpugraphs and disable_tensor_cache arguments to GaudiTrainer 493 skaulintel
- Update BridgeTower example 561 regisss
- Remove mention of eager in readme. set use_lazy_mode to true by default 486 skaulintel
- Add another tokenizer to multilingual list 550 ssarkar2
- Specify problem type for classification 551 ssarkar2

The regression tests associated to this release are here: https://github.com/huggingface/optimum-habana/actions/runs/7085551714

0.15

This release is fully compatible with the recently released [Transformers v4.28](https://github.com/huggingface/transformers/releases/tag/v4.28.0) and [Diffusers v0.15](https://github.com/huggingface/diffusers/releases/tag/v0.15.0).

- Upgrade to Diffusers 0.15.0 201 regisss
- Upgrade to Transformers 4.28 202 regisss

Improved data sampling for training in lazy mode

This release enables to make sure that all batches will have the same size in lazy mode to prevent extra graph compilations.

- Improve data sampling for training in lazy mode 152 regisss

HPU graphs for distributed runs and generation

This release enables HPU graphs for distributed runs and text generation.

- Enable HPU graphs for distributed runs and generation 179 regisss

Recommend `dataloader_num_workers` for CV model training

ViT and Swin examples have been updated to add the `dataloader_num_workers` that enables to speed up training.

- Adding dataloader_num_workers into example command for better performance 188 ZhaiFeiyue

Enable to pipeline forward and backward passes

The argument `pipelining_fwd_bwd` enables to trigger the HPU compution of the forward pass while the CPU interprets the backward pass. This enables to speed up CV models.

- Add mark_step between fwd and bwd for better performance 189 ZhaiFeiyue

More information in [the documentation](https://huggingface.co/docs/optimum/habana/usage_guides/accelerate_training#pipelining-forward-and-backward-passes).

0.2.6

What's Changed

* Add hip support by Disty0 in https://github.com/huggingface/optimum-quanto/pull/330
* Switched linters, black -> ruff by ishandeva in https://github.com/huggingface/optimum-quanto/pull/334
* Add marlin int4 kernel by dacorvo and shcho1118 in https://github.com/huggingface/optimum-quanto/pull/333
* fix: use reshape instead of view by dacorvo in https://github.com/huggingface/optimum-quanto/pull/338
* Support QLayerNorm without weights by dacorvo in https://github.com/huggingface/optimum-quanto/pull/341

New Contributors
* ishandeva made their first contribution in https://github.com/huggingface/optimum-quanto/pull/334
* Disty0 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/330
* shcho1118 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/333

**Full Changelog**: https://github.com/huggingface/optimum-quanto/compare/v0.2.5...v0.2.6

0.2.5

New features

- Load and save models from the Hugging Face hub 263 by sayakpaul
- Add support for float8 e4f3mnuz 310 (from 281) by maktukmak
- Faster and less memory-intensive requantization 290 by latentCall145
- Support torch.equal for QTensor 294 by dacorvo
- Add Marlin Float8 kernel 296 (from 241) by fxmarty
- Add Whisper for speech recognition example 298 (from 242) by mattiadg
- Add ViT classification example 308 by shovan777

Bug fixes

- Fix include patterns in quantize 271 by kaibioinfo
- Enable non-strict loading of state dicts 295 by BenjaminBossan
- Fix transformers forward error 303 by dacorvo
- Fix missing call in transformers models 325 by dacorvo
- Fix 8-bit mm calls for 4D inputs 326 by dacorvo

**Full Changelog**: https://github.com/huggingface/optimum-quanto/compare/v0.2.4...v0.2.5

Page 19 of 23

Releases

Has known vulnerabilities

Previous Next

Optimum

Page 19 of 23

1.0.0

0.29

0.23

0.15

0.2.6

0.2.5

Page 19 of 23

Links

Releases