Optimum-habana

Latest version: v1.16.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 9

1.13.0

1.12.1

Fix 1st token latency time measure

- Fix 1st token latency time 1091 libinta


Fix for Mixtral

- Mixtral typo fix 1107 schoi-habana


Other

- Fix for selective seq length test with batch size 1 1110 libinta

**Full Changelog**: https://github.com/huggingface/optimum-habana/compare/v1.12.0...v1.12.1

1.12

- Switch to SynapseAI v1.12.0 453 regisss


Various model optimizations

- Fix graph compilation error from Falcon when batch size>1 356 schoi-habana
- Add mpt optimization for gaudi 363 sywangyi
- Improve MPT inference performance 377 schoi-habana
- Allocate KV cache in contiguous memory for HPU performance 394 puneeshkhanna
- Add support for attention softmax in BF16 such as for llama 396 puneeshkhanna
- Add trim logit logic to reduce maximum memory usage for Llama inference 395 BaihuiJin
- Skip hpugraph usage for first token to save memory 397 polisettyvarma
- Llama inference:add reuse_cache to save memory 409 regisss
- GPT2 contiguous fix 421 ZhaiFeiyue
- Improve perf and memory usage with reuse cache by slicing inputs till token idx for 1st token generation 422 puneeshkhanna
- GPT-J/NeoX contiguous 454 BaihuiJin


TGI

- Fix gptj incorrect output issue in TGI 340 sywangyi
- Enable hpu graph 330 htang2012
- Upgrade to TGI v1.0.3 373 regisss
- Accelerate the inference when input prompt length changes in TGI 386 sywangyi
- Support static shape in concatenate and filter in TGI 389 sywangyi
- Fix bloom concatenate and filter issue 401 sywangyi
- Fix error in logits process in hpu graph 404 sywangyi
- Fix first token 408 regisss
- Temporary fix in TGI for max total tokens 443 hsubramony


Check min version in examples

A utility method was added to ensure that the latest version of Optimum Habana is installed to run the examples.

- Add check_optimum_habana_min_version 335 regisss


Others

- Add support for autocast custom ops in GaudiTrainer 308 regisss
- Add warmup arg and move stats printing to the end 390 polisettyvarma
- Add a configurable max input tokens parameter 426 puneeshkhanna
- Add transformers model tests for gaudi 427 ankurneog
- Modify loraft llama falcon 415 libinta
- Option to not crop in dataset run 444 ssarkar2
- Enable auto tensor parallelism for Falcon 451 mandy-li


Various fixes

- Fixes for streaming dataset mode 324 MohitIntel
- Fix beam search output 360 puneeshkhanna
- Fix DDP for LoRA 368 sywangyi
- Load llama ckpt to meta to work around OOM issue on CPU 359 mandy-li
- Fix gradient checkpointing in LoRA example 398 regisss
- No need to wrap DDP when using Fast DDP 430 ikurtchen
- Fix falcon-40b error when DeepSpeed enabled 434 schoi-habana
- Revert "Fix T5 DeepSpeed ZeRO-3 (393)" 466 sywangyi


Regression tests for this release are available here: https://github.com/huggingface/optimum-habana/actions/runs/6580186897

1.12.0

1.11.1

Llama3 has been validated on Gaudi

- Llama3 test and readme changes 905 ssarkar2


Fix issue with `pytest`

The latest SynapseAI Docker images come with Pytest v8 already installed, which is incompatible with the Transformers library and leads to an error in a few non-test cases. As a temporary workaround, Pytest is pinned and moved as a hard dependency.

- Move pytest dependency 883 regisss


Other

- Fp8 merge fix 863 libinta
- Fixed "reuse_cache" Bug 888 Danielohayon
- Remove deprecated AOT_HPU_TRAINING_BACKEND 877 astachowiczhabana
- Add mark step and inplace residual add in llama model code 833 puneeshkhanna
- Enable Flash Attention in recompute and causal modes 862 wszczurekhabana
- Add mark_step for llama inference 875 libinta

**Full Changelog**: https://github.com/huggingface/optimum-habana/compare/v1.11.0...v1.11.1

1.11

SynapseAI v1.11 (latest stable release) is fully supported.

- Upgrade to Synapse 1.11 333 regisss


Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen

- Added support for OPT-66B 285 ZhaiFeiyue
- Llama 296 yeonsily
- Improve Llama2 and gpt_neox performance with Habana fused RoPE and RMSNorm 321 mandy-li
- Enable Falcon-7b 326 schoi-habana
- Fix inference with Llama-2-70B 342 regisss
- Add model optimizations for codegen and gpt_bigcode 322 PhillipHoward


Torch Autocast

:warning: **Habana Mixed Precision is deprecated and will be removed in SynapseAI v1.12.**
Torch Autocast is becoming the default for managing mixed-precision runs.

- Fix autocast for BERT-like models 287 ANSHUMAN87
- Add support for autocast in gradient checkpointing 307 regisss


Improved text-generation example

- Added constrained beam search 281 vivekgoe
- Fix padding error 282 sywangyi
- Various improvements for faster checkpoint downloading 284 286 294 regisss
- Add deepspeed TP policy for llama 303 sywangyi
- Add token and model_revision args for the text-generation example 331 regisss


LoRA examples

Two new LoRA examples for [fine-tuning](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#peft) and [inference](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#use-peft-models-for-generation).

- Add lora example for clm and text generation 305 sywangyi


LDM3D

New Stable Diffusion pipeline that enables to generate images and depth maps.

- Support for Ldm3d 304 estelleafl


Added support for Text Generation Inference (TGI)

[TGI](https://github.com/huggingface/text-generation-inference) is now supported on Gaudi.

- Add support for TGI on Gaudi 297 regisss


`GaudiGenerationConfig`

Transformers' `GenerationConfig` has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.

- Add GaudiGenerationConfig 293 regisss


Various fixes and improvements

- Fix generation sampling when using `repetition_penalty` 301 sywangyi
- Remove kv cache wa 302 ZhaiFeiyue
- Fix T5 inference performance regression 310 libinta
- Fix gptj HCCL issue occured in DDP 318 sywangyi
- Revert partially Enable/Optimize flan t5 xxl on deepspeed z3 320 hsubramony
- Modify flan-t5 deepspeed configuration 328 yeonsily
- Add commands for gptj and gptneox 325 ankurhabana
- Disable FusedRMSNorm for training 343 hsubramony
- Enable hpu rms fused kernel for t5 344 ZhaiFeiyue
- Remove two workarounds on esmfold 334 bzhu-habana

Page 4 of 9

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.