- Upgrade to Transformers 4.40 1027 regisss
Speculative Sampling
- Speculative sampling on Gaudi using Optimum-Habana 973 nraste
- Fix assisted decoding generation error 1080 libinta
Model optimizations
- Add --bucket_size support for gpt_bigcode 802 jiminha
- Optimize StableLM model inference 805 XinyuYe-Intel
- Enable google/gemma-7b. 747 lkk12014402
- Enable llava static generation. 767 lkk12014402
- Fix perf drop in flan-t5 summarization 908 MohitIntel
- Enable Qwen2 model 774 XinyuYe-Intel
- Extend bucket_internal to SAMPLE generation mode 819 xt574chen
- SpeechT5 static consistent dropout 824 Spycsh
- Optimize inference of Persimmon model 822 XinyuYe-Intel
- Enable OWL-ViT graph mode on Gaudi platform 783 cfgfung
- Support mixtral kvcache reuse and remove kv_cache_fp8 898 jychen21
- Add fp8 related changes to mistral for text-generation 918 skaulintel
- Optimization for phi series models: support fp8 kv cache and reuse kv cache 902 yuwenzho
- Support Mistral 32K input token 931 jiminha
- Support mixtral long sequence 32k with bs 4 903 jychen21
- Adapt Mixtral long sequence handling for Mistral 985 jiminha
- Fix performance issue in mistral 1030 jiminha
- Optimized inference of Starcoder2 model 829 XinyuYe-Intel
- Add support for IBM Granite 1045 regisss
- Enable fp8 inference for Llava-hf 7B and 13B in 1.16 release 951 Luca-Calabria
- Fusedrope inp bf16 1026 ssarkar2
- Enhance Qwen2 model with FSDPA and bucket 1033 Zhiwei35
- Optimize seamless-m4t/vits model for text-to-speech generation 825 sywangyi
- cache_optimization 1028 ssarkar2
- Ensure KV cache is not returned as output tensor during decode phase for Falcon 993 schoi-habana
- Fast softmax 972 wszczurekhabana
- Falcon optimization 974 libinta
- Quantization for FSDPA 976 dudilester
- Falcon update park 1052 ssarkar2
- Add the Llava_next support 1041 yuanwu2017
- Improve torch compile performance 1082 libinta
Stable Video Diffusion
- Add SVD pipeline 743 dsocek
PEFT
- Add ia3 and adalora support 809 sywangyi
- Enable prompt tuning/prefix tuning/p tuning clm and example 758 sywangyi
TRL
- Finetuning stable diffusion with DDPO 733 skavulya
Object Segmentation Example
- Add an example of object segmentation (ClipSeg) 801 cfgfung
Dreambooth
- Diffuser dreambooth full/lora/lokr/loha/oft finetune, dreambooth XL lora finetune 881 sywangyi
Others
- Text generation pipeline: Extended functionality to align with run_generation script 782 mgonchar
- Enable clip mediapipe and update G2 baseline 856 MohitIntel
- Add ci test for SFT and DPO 857 sywangyi
- Fix SFT, DPO CI on Gaudi1 893 regisss
- Add SDXL in README 894 regisss
- Fix falcon 180b oom issue if peft > 0.6.2 895 sywangyi
- Enabled additional models in CI 879 MohitIntel
- Add static shape support for vision_encoder_decoder generation if decoder supports static shape 834 sywangyi
- Add HabanaProfile to Stable Diffusion and XL 828 atakaha
- Pytest accuracy updates for Falcon, T5, GPT2 916 Luca-Calabria
- Update text-generation readme with torch.compile info. 884 libinta
- Update Wav2Vec2ModelTest::test_initialization 919 malkomes
- Add linear and dynamic RoPE to Mistral and Mixtral 892 regisss
- Fix for wav2vec2 test cases 923 lqnguyen
- Add nograd() to prevent backward backend 897 astachowiczhabana
- Assisted decoding not implemented 910 tjs-intel
- Disable wav2vec2 symbolic tracing test 904 tjs-intel
- Add support for symbolic tracing of GPT2 models 913 tjs-intel
- Utils: return more reasonable error in case of attempt of non-PyTorch model loading 921 mgonchar
- Pytest accuracy updates for Bridgetower, Swin, Vit 927 Luca-Calabria
- Text generation: added langchain pipeline script 887 mgonchar
- Fix for AST models 914 vidyasiv
- Fix AttributeError for wav2vec test 929 Jianhong-Zhang
- Fix ValueError for test_summarization 939 Jianhong-Zhang
- Grad norm tensor fix 938 yeonsily
- Add information to the audio-classification examples README about --ddp_find_unused_parameters parameter 941 Alberto-Villarreal
- Add leaderboard link 947 echarlaix
- Fix formatting of arg parse help strings in the PEFT example 944 dmsuehir
- Use new Habana llama and falcon model configs 940 skaulintel
- Update based on legal requirements. 900 libinta
- Update test generation config to raise ValueError 949 malkomes
- Add --trust_remote_code for text generation examples 870 yangulei
- Added Llama-2 fp8 text-generation test cases 934 yeonsily
- Upgrade SD output image verification with CLIP score 920 MohitIntel
- Llama Guard for text classification example 871 dsmertin
- Update README logo 950 regisss
- Add Gaudi CI for Sentence Transformers 928 regisss
- Get iteration times through generate() 899 hsubramony
- Update speech recognition seq2seq example 953 regisss
- Fix wrongly all_gather for mixtral finetune 965 ccrhx4
- Add intel-mila protST example 860 sywangyi
- Small CI refacto 968 regisss
- Llama70b one card to infer device map with max memory limitation 963 Yantom1
- Map list to tensors 926 ssarkar2
- Fix fsdp lora torch compile issue 971 sywangyi
- Fix for the simulate_dyn_prompt flag assertion 984 alekseyfa
- Initial enablement with FP8 Training (port from OHF 91) 936 libinta
- Warn user when using --disk_offload without hqt 964 Yantom1
- Assign grad_norm for logging only if it's a single element tensor 992 yeonsily
- Update examples 998 regisss
- Fix warmup for diffusers when batch size < throughput_warmup_steps 960 dsocek
- Add torch.compile instructions for Roberta-Large 981 MohitIntel
- Fix gpt_neox, stablelm inference regression caused by RoPE dtype 999 mandy-li
- fea(examples): Updated the READMEs with requirements.txt installation 1000 imangohari1
- Initial commit for fp8 CI 995 yeonsily
- Fixed 'MixtralConfig' object has no attribute 'rope_scaling' 1009 aslanxie
- Use the lenght of timesteps as the inference step num 986 yuanwu2017
- Fix the bug of output_type=np or latent. 996 yuanwu2017
- Fix wav2vec test load adapter 937 malkomes
- Mark scale as const and remove --fp8 flag usage 962 Yantom1
- Add per step time collection to other methods 1004 ssarkar2
- Fix first token time 1019 ssarkar2
- Fix text-generation example 1025 regisss
- Updates test_beam_search to transformers_4.40 1017 malkomes
- Fix eos problem 1034 sywangyi
- fp8 textgen ci structure update 1029 jiminha
- Fix a return value issue casued by PR 973 1040 yafshar
- Add no_checks for sub dataset in lvwerra/stack-exchange-paired since it does not contain test split 1003 sywangyi
- Readme Update for FSDP 980 hlahkar
- Add unifier script and disk offload flag usages to README. 1023 libinta
- Add mixtral for meta device load due to mixtral-8x22b model size 909 libinta
- Update unifier script 1010 Yantom1
- Update text-generation CI configuration for falcon and Mixtral 1044 yeonsily
- Update multi-node README to check ssh connection issue 1048 yeonsily
- Infra upgrade workflows 480 glegendre01
- Update test_text_generation_example.py 1051 ssarkar2
- BERT training migrated to torch.compile 990 ANSHUMAN87
- Update test_examples.py 1053 ssarkar2
- Update modeling_llama.py: deepspeed fix for codellama 1054 ssarkar2
- No shapes in profilings by default 1050 astachowiczhabana
- Change the way to unset environemt variable for gpt-neox ci 1060 yeonsily
- Update README for Albert torch.compile mode 1061 MohitIntel
- Fix lm_evaluation_harness to specific commit (240) 1064 astachowiczhabana
- Fix text-generation example README.md 1081 shepark