The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.
- Upgrade to SynapseAI 1.13 563 regisss
Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B
Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.
- Enable llama2-70b LoRA finetuning 527 mandy-li
- Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 487
- Enable Falcon 180B 537 hlahkar
Llama2 fp8 inference
- Add llamav2 fp8 inference 542 bgoldberg-habana
Mistral
- Add mistral support for generation 496 sywangyi
Optimizations
- Remove GPTJ dma before mha 468 BaihuiJin
- Enable llama attention softmax in bf16 521 schoi-habana
- Add load_meta_device option to reduce host RAM 529 jiminha
- Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) 532 puneeshkhanna
- Add hash_with_views arg for Falcon inference perf 534 schoi-habana
- Automate skip_hash_with_views for text generation with Falcon 544 regisss
Improved text generation
- Allow multi prompts 479 ssarkar2
- Growing bucket for beam 450 ssarkar2
- Some models have extra inputs, pad them too 488 ssarkar2
- Refactor run generation 523 bgoldberg-habana
- Fix setting of reuse cache 553 puneeshkhanna
- No need to unsqueeze input_id in prepare_inputs_for_generation 559 sywangyi
- Adding lm eval script 541 bgoldberg-habana