Optimum

Latest version: v1.23.3

Safety actively analyzes 687918 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 18 of 22

1.1.1

Habana

- Installation details added for [Optimum-Habana](https://github.com/huggingface/optimum-habana) which provides optimized transformers integration for [Intel's Habana Gaudi Processor (HPU)](https://habana.ai/training/).

ONNX Runtime

- Add the possibility to specify the execution provider in `ORTModel`.
- Add `IncludeFullyConnectedNodes` class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
- Update `QuantizationPreprocessor` so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
- Rename `Seq2SeqORTTrainer` to `ORTSeq2SeqTrainer` for clarity and to keep consistency.
- Add `ORTOptimizer` support for [ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra) models.
- Fix the loading of pretrained `ORTConfig` which contains optimization and quantization config.

1.1.0

ORTTrainer and Seq2SeqORTTrainer

The `ORTTrainer` and `Seq2SeqORTTrainer` are two newly experimental classes.
- Both `ORTTrainer` and `Seq2SeqORTTrainer` were created to have a similar user-facing API as the [`Trainer`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer) and [`Seq2SeqTrainer`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainer) of the Transformers library.
- `ORTTrainer` allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.
- `ORTTrainer` allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.
- For `Seq2SeqORTTrainer`, ONNX Runtime inferencing is incompatible with `--predict_with_generate`, as the generate method is not supported yet.

ONNX Runtime optimization and quantization APIs improvements

The `ORTQuantizer` and `ORTOptimizer` classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

- Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the `ORTQuantizer` method `partial_fit`. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
- When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
- The classes `OptimizationConfig`, `QuantizationConfig` and `CalibrationConfig` were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration `ORTConfig`.
- The `QuantizationPreprocessor` class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.

1.0.1

With this release, we enable easy and fast deployment of models from the Transformers library on Habana Gaudi Processors (HPU).

- The class `GaudiTrainer` is built on top of the original class `Trainer` and enables to train and evaluate models from the Transformers library on HPUs.
- The class `GaudiTrainingArguments` is built on top of the original class `TrainingArguments` and adds 3 new arguments:
- `use_habana` to deploy on HPU
- `use_lazy_mode` to use lazy mode instead of eager mode
- `gaudi_config_name` to specify the name of or the path to the Gaudi configuration file
- The class `GaudiConfig` enables to specify a configuration for deployment on HPU, such as the use of Habana Mixed Precision, the use of custom ops,...
- Multi-card deployment is enabled
- Examples are provided for *question answering* and *text classification* in both single- and multi-card settings.
- The following models have been validated:
- BERT base/large
- RoBERTa base/large
- ALBERT large/XXL
- DistilBERT

1.0.0

ONNX Runtime support

* An `ORTConfig` class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
* The `ORTOptimizer` class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of `ORTOptimizer`, the user needs to provide an `ORTConfig` object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the `ORTOptimizer.fit` method.
* ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added `ORTQuantizer` class. In order to create an instance of `ORTQuantizer`, the user needs to provide an `ORTConfig` object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the `ORTQuantizer.fit` method.

Additionnal features for Intel Neural Compressor

We have also added a new class called `IncOptimizer` which will take care of combining the pruning and the quantization processes.

0.29

- Upgrade optimum-habana diffusers dependency from 0.26.3 to 0.29.2 1150 dsocek


Stable Diffusion 3

- Sd3 1153 dsocek
- Refactor SD3 1199 dsocek


Training with Sentence Transformers

- Enable Sentence Transformer Trainer with Gaudi 1111 ZhengHongming888


Model optimizations

- Fix starcoder2 accuracy issue and optimize performance with fused rope 1095 mandy-li
- Enable FusedRoPE using float32 for gpt-neox model 1104 yeonsily
- Mamba initial enablement. 1122 libinta
- Adding fused qkv support along with config 1102 bhargaveede
- Enhance Qwen2 with fastsoftmax and bf16 RoPE and cache optimization 1087 Zhiwei35
- Enable fp8 inference for Llava-Next and add Fused_SDPA 1120 tthakkal
- Support bucket_internal for MPT 1137 pk1d3v
- Enable Flash Attention (Fused SDPA) for Starcoder 1114 abhilash1910
- gpt_bigcode: added FusedSDPA kernel 1138 mgonchar
- Enable torch.compile for Granite20B 1185 dvarshney-habana
- Refine use cache for mpt model 1158 Jing1Ling
- GPT-J support reuse_cache 1094 atakaha
- Use fast softmax only on prefill 1159 jaygala223
- Starcoder2 : KVCache and flash attention (FusedSDPA) enablement 1149 abhatkal
- Gpt bigcode fused sdpa 1260 yeonsily


SAM, FastVIT, VideoMAE, OpenCLIP, DETR, Table Transformer, deciLM

- Add an example of Segment Anything Model [Inference] 814 cfgfung
- Add an example of FastViT model (Infernece) 826 cfgfung
- VideoMAE Model Enabling and Examples 922 pi314ever
- OpenCLIP sample for visual question answering 977 vidyasiv
- Enabled DETR (Object Detection) model 1046 cfgfung
- Table transformer enabling 978 pi314ever
- deciLM support 1133 sywangyi


Stable Diffusion inpainting, unconditional image generation

- Add the Stable diffusion inpaint support 869 yuanwu2017
- Enable Unconditional Image Generation on Gaudi 2 [Diffuser/Tasks] 859 cfgfung


Text feature extraction example

- Feature extraction enabling 994 pi314ever


Tensor parallelism

- Tensor parallel distributed strategy without using deepspeed 1121 kalyanjk
- Disable torch.compile for all_reduce when parallel_strategy is set to "tp" 1174 kalyanjk


Kubernetes cluster example

- Adds a helm chart, dockerfile, and instructions for running examples using a Kubernetes cluster 1099 dmsuehir
- Fix PyTorch version in the Kubernetes docker-compose to match image 1246 dmsuehir


FP8 training

- TE FP8 integration 1096 SanjuCSudhakaran


Other

- Updates run_lora_clm.py with enhanced dataset support 955 dmsuehir
- Fix prefix tuning finetune issue and update test 975 sywangyi
- Fix throughput calculation in image-to-text example 1070 regisss
- SDXL-trainig: fixed ci, changed gated dataset, fixes for non-square datasets 1038 imangohari1
- Updating batch_size of Albert-XXL in README 1063 vineethanandh
- Fix the error of running run_pipeline.py of text_generation example 1055 yuanwu2017
- Add a test for llama finetuning with FP8 precision 1106 SanjuCSudhakaran
- Beam-search fix 1113 ssarkar2
- Add chat format support dataset in SFT 1066 libinta
- Fix nan loss of gemma and crash if dataset_concatenation is not set 1088 sywangyi
- torch.compile keep input mutation in graph this avoids unnecessary memcpy 1069 sushildubey171
- Updated langchain text-generation pipeline to work with latest release 0.2.5 1084 rbrugaro
- Add the MC example 891 yuanwu2017
- Fix recompiles if limit_hpu_graph is False 1129 ssarkar2
- Update examples batchsize in README 1123 shepark
- Fix OOM error in SDXL Fine-Tuning validation stage 1134 dsocek
- Added an example code to demonstrate how to use deterministic image generation 878 cfgfung
- SD image variation/InstructPix2Pix/StableDiffusionXLImg2ImgPipeline pipeline 988 sywangyi
- Add ci test for trl rewarding and ppo, fix backward failure in ppo caused by rmsfusion 1020 sywangyi
- Llama adapter 983 sywangyi
- torch.flip issue is fixed in SynapseAI 1.16, so remove the WA 1092 sywangyi
- Fix test CausalLanguageModelingLORAExampleTester KeyError 1139 dmsuehir
- fix(ci): new runs-on 1136 XciD
- Add trust_remote_code for loading datasets in the audio classification example 1074 regisss
- Generation example: print number of warmup iterations 1145 mgonchar
- CI Updates: text-gen to recieve ranks/bs, Updated bs/metric for baselines 1140 imangohari1
- Support for custom files for run_lora_clm.py 1039 vidyasiv
- Change the device_id for FSDP plugin 1086 ckvermaAI
- Set KV Cache update as static method 1160 ulivne
- To fix CPU tensor issue 1157 mkumargarg
- Adding missing __init__.py to mistral and mixtral test package 1188 rkumar2patel
- Add example of multitask_prompt/poly tuning 915 sywangyi
- Fix data-type mismatch for mlperf_inference accuracy test 1146 kalyanjk
- Fix spawn MP context, limit cpu and download data 1131 polisettyvarma
- T5 multi card 1222 yafshar
- Add trust_remote_code for t5 poly-tuning test 1220 yafshar
- Resolve "empty tensor optional" error with hpu_graphs + kv cache for StarCoder 1181 vidyasiv
- Fix VIT, add wav2vec comment 1223 ssarkar2
- Roberta tests were running on CPU 1229 ssarkar2
- Fix bert/roberta contrastive search tests 1226 skavulya
- Remove the default env variable to trust remote code by default 1225 yafshar
- Improve style check workflow 1230 regisss
- Added scheduler selection for SDXL fine-tuning 867 kplau1128
- Clear help msg for ignore_eos to avoid misunderstanding sywangyi
- Support loading hugging face checkpoint 1165 ulivne
- Change triggering event for code style check 1238 regisss
- gptj: fix missing token_idx 1234 envsp
- fix(nltk): fixed the version to working one 1247 imangohari1
- Updating to avoid hardcoding tests in CI framework 1221 vidyasiv
- Fix FSDP graph error due to Tranformer 4.43 update 1251 jiminha
- Fix SD README commands 1250 imangohari1
- Fix spelling errors 1252 changwangss
- Set HLS_MODULE_ID only if it wasn't set previously 1254 astachowiczhabana
- Fix overflow of steps in SDXL for default diffusers scheduler dsocek
- fix(test_diffusers): automated the checking for tests without upstream HF 1232 imangohari1
- fix(nltk): Revert 1247. Updated the version. added the punkt_tab download 1258 imangohari1
- Set input_embeds before it gets used 1261 tthakkal
- Update README and more changes, rebase to main 1259 shepark


Known limitations

- For Llama, some big batch sizes lead to out-of-memory errors whereas they used to work

0.23

This version has been validated for Transformers v4.34 and Diffusers v0.23.

- Upgrade to Transformers 4.34 475 regisss
- Upgrade to Diffusers 0.23 516 regisss
- Pin Diffusers 565 regisss

TGI

- Add link to TGI license 517 regisss
- Tgi Sharded feature 485 libinta

Dynamic shape support

- Add infra to enable/disable dynamic shapes feature through gaudi_config 513 vivekgoe

Habana Mixed Precision was removed in favor of Torch Autocast

- Remove HMP from optimum-habana 349 jwieczorekhabana

Various fixes

- Fix for SegFault during FT 483 MohitIntel
- Enable/disable gradient_checkpointing as per training_args.gradient_checkpointing value 484 vivekgoe
- Fix split validation dataset problem 489 mandy-li
- Fix validate dataset problem for openassistant-guanaco 498 mandy-li
- Fix for Accelerate 500 regisss
- Fix deepspeed init issue when using external launcher 497 yuanwu2017
- Update Transformers dependency in setup.py 504 regisss
- Fix token transmission in text-generation example 509 regisss
- Merge LoRA model before initializing DS inference in text-generation example 515 regisss
- Fix for Falcon-40b inference with deepspeed 502 schoi-habana
- Fixing FusedSDPA recompute bug 512 skaulintel
- Fixing update method - avoid copy idx to cpu which splitting the graph 524 bgoldberg-habana
- Fix missing max_position_embeddings in model config in run_clm.py 530 regisss
- Fix for attn_softmax_bf16 when generation_config is None 531 schoi-habana
- Fix loading on meta device for PEFT models with DS-inference 528 regisss
- Fix split by whitespaces not a single space 540 oelayan7
- Fix stable diffusion pipelines 548 regisss
- Update trainer.py 549 skaulintel
- Add fallback for PEFT when the base model doesn't exist 557 regisss

Others

- Update GaudiNIC multi-node-training dockerfile and setup 477 yeonsily
- Adding ignore_eos flag to use in generation 469 bhargaveede
- Add maximum hpugraphs and disable_tensor_cache arguments to GaudiTrainer 493 skaulintel
- Update BridgeTower example 561 regisss
- Remove mention of eager in readme. set use_lazy_mode to true by default 486 skaulintel
- Add another tokenizer to multilingual list 550 ssarkar2
- Specify problem type for classification 551 ssarkar2

The regression tests associated to this release are here: https://github.com/huggingface/optimum-habana/actions/runs/7085551714

Page 18 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.