The codebase is fully validated for Transformers v4.38.
- Upgrade to Transformers 4.38 788 regisss
Model optimizations
- Add optimization for blip text model generation 653 sywangyi
- Enable internal kv bucket in llama 720 xt574chen
- Enable Mixtral-8x7B 739 jychen-habana
- Update Mixtral-8x7B fp8 hqt example 756 jychen-habana
- Further fixes for performance with internal bucketing 781 puneeshkhanna
- speecht5 optimization 722 sywangyi
- move img_maskget_attn_mask() to hpu 795 hsubramony
- Mistral optimizations 804 ssarkar2
Image-to-text and VQA examples
- Add image-to-text and visual question answering example 738 sywangyi
torch.compile
- Enable torch_compile mode for distributed 659 kalyanjk
- Fix graph breaks in torch compile mode 806 hlahkar
- Fix torch.compile for text generation 811 regisss
- Add Llama7b FSDP test for torch.compile mode 818 pankd
Bug fixes
- Fix beamsearch crash and incorrect output in decode-only model and encode-decode model 627 sywangyi
- Fix translation models 710 vidyasiv
- Fix throughput calculation for diffusion models 715 skavulya
- Fix crash in llama mode in llava image-to-text generation 755 sywangyi
- Fix backward error in DDP when running reward model finetune in RLHF 507 sywangyi
- Fix get_dtype and convert_into_dtypes 769 regisss
- Override sdpa option in Gaudi 771 jiminha
- Fix Llama-70B-FSDP model loading issue 752 hlahkar
- Fix FSDP in transformer4.38 812 libinta
- Delay importing deepspeed comm due for perf 810 jiminha
- Fix llama rotary pos emb issue for transformers 4.38 813 libinta
- Fix torch.full issue below when running deepspeed z3 for llama 820 libinta
- Fix profile issue with 1st step 837 libinta
- Fix mistral after syn1.15 update 858 ssarkar2
Others
- Small test_text_generation_example.py refacto 725 regisss
- Update README, add PPO support 721 sywangyi
- Update the Mistral model naming 726 yafshar
- Changing backend name 708 vivekgoe
- Update ppo_trainer.py 718 skaulintel
- Add seed in sft example, make sft result reproducable 735 sywangyi
- Adding a flag whether to save checkpoint or not in run_lora_clm.py 736 yeonsily
- Refactor and update CI for encoder-decoders 742 regisss
- Expose Llama Fused OPs control from run_lora_clm.py 751 hlahkar
- Fixing tests by making static_shapes False 778 bhargaveede
- Fix ControlNet README 785 regisss
- Workaround for RoPE computed in bf16 for GPT-NeoX 746 regisss
- Add Whisper and SpeechT5 to model table 790 regisss
- Update summarization example README 791 srajabos
- Block torchscript pytest because of seg fault issue 793 yeonsily
- Fix test_encoder_decoder.py for opus-mt-zh-en 798 regisss
- Replacing obsolete API for mediapipe 796 MohitIntel
- Add --distribution_strategy fast_ddp in contrastive-image-text README and BridgeTower test 799 regisss
- Fix redundant bucket internal and hpu graph setting 797 puneeshkhanna
- Add Llama test for fsdp 761 hlahkar
- Enable dynamic shapes for esmfold 803 hsubramony
- Add Llama/Llama2 support in Question-Answering 745 kplau1128
- Update MLM example 830 regisss
- Revert Wav2Vec2 TDNNLayer forward function same as transformer v4.37.2 827 yeonsily
- Save CI test output image 835 MohitIntel
- Update ckpt loading 773 schoi-habana
- Skip SDXL test in CI 840 regisss
- Fix FSDP test on Gaudi1 841 regisss
- Remove installation from source for Diffusers in CI 846 regisss
- Fix fp8 ci 852 regisss
- Fix PR 848 853 regisss
- Disable safe loading tests in CI 854 regisss
- Add warmup for eval 855 libinta
Known issue
- A crash may occur with [unify_measurements.py](https://github.com/huggingface/optimum-habana/blob/main/examples/text-generation/quantization_tools/unify_measurements.py)