Optimum-habana

Latest version: v1.11.1

Safety actively analyzes 622882 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 6

4.38

The codebase is fully validated for Transformers v4.38.

- Upgrade to Transformers 4.38 788 regisss

Model optimizations

- Add optimization for blip text model generation 653 sywangyi
- Enable internal kv bucket in llama 720 xt574chen
- Enable Mixtral-8x7B 739 jychen-habana
- Update Mixtral-8x7B fp8 hqt example 756 jychen-habana
- Further fixes for performance with internal bucketing 781 puneeshkhanna
- speecht5 optimization 722 sywangyi
- move img_maskget_attn_mask() to hpu 795 hsubramony
- Mistral optimizations 804 ssarkar2

Image-to-text and VQA examples

- Add image-to-text and visual question answering example 738 sywangyi

torch.compile

- Enable torch_compile mode for distributed 659 kalyanjk
- Fix graph breaks in torch compile mode 806 hlahkar
- Fix torch.compile for text generation 811 regisss
- Add Llama7b FSDP test for torch.compile mode 818 pankd

Bug fixes

- Fix beamsearch crash and incorrect output in decode-only model and encode-decode model 627 sywangyi
- Fix translation models 710 vidyasiv
- Fix throughput calculation for diffusion models 715 skavulya
- Fix crash in llama mode in llava image-to-text generation 755 sywangyi
- Fix backward error in DDP when running reward model finetune in RLHF 507 sywangyi
- Fix get_dtype and convert_into_dtypes 769 regisss
- Override sdpa option in Gaudi 771 jiminha
- Fix Llama-70B-FSDP model loading issue 752 hlahkar
- Fix FSDP in transformer4.38 812 libinta
- Delay importing deepspeed comm due for perf 810 jiminha
- Fix llama rotary pos emb issue for transformers 4.38 813 libinta
- Fix torch.full issue below when running deepspeed z3 for llama 820 libinta
- Fix profile issue with 1st step 837 libinta
- Fix mistral after syn1.15 update 858 ssarkar2

Others

- Small test_text_generation_example.py refacto 725 regisss
- Update README, add PPO support 721 sywangyi
- Update the Mistral model naming 726 yafshar
- Changing backend name 708 vivekgoe
- Update ppo_trainer.py 718 skaulintel
- Add seed in sft example, make sft result reproducable 735 sywangyi
- Adding a flag whether to save checkpoint or not in run_lora_clm.py 736 yeonsily
- Refactor and update CI for encoder-decoders 742 regisss
- Expose Llama Fused OPs control from run_lora_clm.py 751 hlahkar
- Fixing tests by making static_shapes False 778 bhargaveede
- Fix ControlNet README 785 regisss
- Workaround for RoPE computed in bf16 for GPT-NeoX 746 regisss
- Add Whisper and SpeechT5 to model table 790 regisss
- Update summarization example README 791 srajabos
- Block torchscript pytest because of seg fault issue 793 yeonsily
- Fix test_encoder_decoder.py for opus-mt-zh-en 798 regisss
- Replacing obsolete API for mediapipe 796 MohitIntel
- Add --distribution_strategy fast_ddp in contrastive-image-text README and BridgeTower test 799 regisss
- Fix redundant bucket internal and hpu graph setting 797 puneeshkhanna
- Add Llama test for fsdp 761 hlahkar
- Enable dynamic shapes for esmfold 803 hsubramony
- Add Llama/Llama2 support in Question-Answering 745 kplau1128
- Update MLM example 830 regisss
- Revert Wav2Vec2 TDNNLayer forward function same as transformer v4.37.2 827 yeonsily
- Save CI test output image 835 MohitIntel
- Update ckpt loading 773 schoi-habana
- Skip SDXL test in CI 840 regisss
- Fix FSDP test on Gaudi1 841 regisss
- Remove installation from source for Diffusers in CI 846 regisss
- Fix fp8 ci 852 regisss
- Fix PR 848 853 regisss
- Disable safe loading tests in CI 854 regisss
- Add warmup for eval 855 libinta

Known issue

- A crash may occur with [unify_measurements.py](https://github.com/huggingface/optimum-habana/blob/main/examples/text-generation/quantization_tools/unify_measurements.py)

4.37

- Upgrade to Transformers 4.37 651

**Full Changelog**: https://github.com/huggingface/optimum-habana/compare/v1.10.0...v1.10.2

4.31

Transformers v4.31 (latest stable release) is fully supported.

- Upgrade to Transformers v4.31 312 regisss

1.15

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.15.0.

- Upgrade to SynapseAI 1.15.0 831 regisss

SDXL fine-tuning

- SDXL fine tuning 667 dsocek
- Mediapipe sdxl 787 ssarkar2

Whisper

- Support speech recognition with whisper models and seq2seq 704 emascarenhas

Phi

- Enable phi series models 732 lkk12014402

ControlNet

- Controlnet training 650 vidyasiv

1.14

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.14.0.

- Upgrade to SynapseAI 1.14 https://github.com/huggingface/optimum-habana/pull/664 regisss

Stable Diffusion XL

SDXL is now supported and optimized for Gaudi.

- Stable Diffusion XL for Gaudi 619 dsocek
- Update for SDXL Turbo support 634 atakaha

Textual inversion fine-tuning

An example of textual-inversion fine-tuning has been added.

- Add Textual Inversion fine-tuning script 243 regisss

TRL

The 🤗 [TRL library](https://github.com/huggingface/trl) is now supported on Gaudi for performing DPO and SFT.

- Add DPO and SFT of TRL support in Gaudi and example 601
- Restructure example/trl/stack_llama_2 for generic DPO 635 libinta
- Add DPO of TRL in README.md 652 libinta
- Add seed in DPO for reproduce the training result 646 sywangyi

Full bf16 evaluation

Full bf16 evaluation inside the trainer can now be performed like in Transformers.

- Adding support for bf16_full_eval 610 bhargaveede

Text-generation pipeline

A text-generation pipeline fully optimized for Gaudi has been added.

- Text-Generation Pipeline Example 526 sjagtap1803

Model optimizations

- Enhances llama performance by removing the 'cast_f32_to_bf16' operation 564 kalyanjk
- Refactoring LLama Attention and mlp layers 589 bgoldberg-habana
- Support for FlashAttention in Llama2 584 wszczurekhabana
- Integrate Habana flash attention to Llama2-70B finetune 596 mandy-li
- Enabling T5ForConditionalGeneration Inference using static shapes 425 bhargaveede
- Avoid falcon perf drop from PR607 when BS=1 schoi-habana
- Enable fused rmsnorm in bf16 for llama 621 puneeshkhanna
- Flash attention enhancement of repeatKV 626 puneeshkhanna
- Update repeat KV llama logic for better TP-4 performance 639 puneeshkhanna
- Falcon changes for v1.14.0 release 654 schoi-habana

TGI

TGI on Gaudi has been moved to a dedicated repo: https://github.com/huggingface/tgi-gaudi

- Update tokenizer for tgi 572 hsubramony
- Remove redundant requirements 575 hsubramony
- Change next_token_chooser to HeterogeneousNextTokenChooser for TGI 574 yeonsily
- Remove TGI folder from Optimum Habana 597 regisss

Various fixes

- Fix messed up README for llama2-70b 571 mandy-li
- Fix Diffusers tests 570 ssarkar2
- Fix fp8 command in text-generation README 586 regisss
- Fix wav2vec inference bug 588 skaulintel
- Fix hash_with_views error 587 bgoldberg-habana
- Add dataset disposal of b-mc2/sql-create-context for codegen and fix zero3 lora save issue 552 sywangyi
- Fix gptj training issue 594 BaihuiJin
- Fix DataLoaderDispatcher issue in Gaudi 600 sywangyi
- Fix for Falcon error from PR 587 608 schoi-habana
- Falcon graph compilation error fix for when bs>1 607 regisss
- Fix crash if gaudi_config is not passed to GaudiTrainer 613 sywangyi
- Fix flash attention output for llama for padded batched inputs 623 puneeshkhanna
- Fix backward error in DDP when running reward model finetune in RLHF 507 sywangyi
- Fix dpo graph compile error in evaluation 630 sywangyi
- Fix error in run_image_classification.py 631 regisss
- Fix RLHF llama rewarding modeling backward issue 612 sywangyi
- Fix SD example so that custom bf16 ops can be used 642 regisss
- Fix SD2 test 647 regisss
- Fix typo in README 656 yeonsily
- Fix error in PR654 661 schoi-habana
- Fix compile error for torch_cmpile for llama 662 jiminha
- Fix SDXL test 666 regisss

Others

- Remove red crosses in model table 577 regisss
- Misc changes for transformers tests 581 ankurneog
- Remove delete_doc_comment workflows 582 regisss
- Pin PEFT for the languge-modeling example 591 regisss
- Remove workarounds to have causal_mask in uint8 for GPT2, GPT-J and CodeGen 592 regisss
- Change Synapse validated version in README 603 regisss
- Dyn prompt afterrefactor 543 ssarkar2
- In peft, only the trainable parameters need to be saved 576 sywangyi
- Add inheritance in Diffusers pipelines 611 regisss
- Update generation config to enable flash attention for inference 609 puneeshkhanna
- Remove setting of PT_HPU_LAZY_MODE=2 in training_args.py 625 vivekgoe
- Remove hpu:X notation untill fully supported by bridge 637 hsubramony
- Add use_flash_attention to Llama2-70B finetuning command in README 640 mandy-li
- Enable master_port selecting for DeepSpeed and MPI 641 yangulei
- Enabling Graphs in Wav2Vec AC training 622 bhargaveede
- Add changes to support FSDP 598 vivekgoe
- Run Llama2 with torch.compile on Gaudi2 616 kausikmaiti
- Hqt 648 bgoldberg-habana

1.13

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.

- Upgrade to SynapseAI 1.13 563 regisss

Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B

Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.

- Enable llama2-70b LoRA finetuning 527 mandy-li
- Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 487
- Enable Falcon 180B 537 hlahkar

Llama2 fp8 inference

- Add llamav2 fp8 inference 542 bgoldberg-habana

Mistral

- Add mistral support for generation 496 sywangyi

Optimizations

- Remove GPTJ dma before mha 468 BaihuiJin
- Enable llama attention softmax in bf16 521 schoi-habana
- Add load_meta_device option to reduce host RAM 529 jiminha
- Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) 532 puneeshkhanna
- Add hash_with_views arg for Falcon inference perf 534 schoi-habana
- Automate skip_hash_with_views for text generation with Falcon 544 regisss

Improved text generation

- Allow multi prompts 479 ssarkar2
- Growing bucket for beam 450 ssarkar2
- Some models have extra inputs, pad them too 488 ssarkar2
- Refactor run generation 523 bgoldberg-habana
- Fix setting of reuse cache 553 puneeshkhanna
- No need to unsqueeze input_id in prepare_inputs_for_generation 559 sywangyi
- Adding lm eval script 541 bgoldberg-habana

Page 1 of 6

Releases

Has known vulnerabilities

Optimum-habana

Page 1 of 6

4.38

4.37

4.31

1.15

1.14

1.13

Page 1 of 6

Links

Releases