SynapseAI v1.11 (latest stable release) is fully supported.
- Upgrade to Synapse 1.11 333 regisss
Optimizations for Llama 2, Falcon, StarCoder, OPT, GPT-NeoX, CodeGen
- Added support for OPT-66B 285 ZhaiFeiyue
- Llama 296 yeonsily
- Improve Llama2 and gpt_neox performance with Habana fused RoPE and RMSNorm 321 mandy-li
- Enable Falcon-7b 326 schoi-habana
- Fix inference with Llama-2-70B 342 regisss
- Add model optimizations for codegen and gpt_bigcode 322 PhillipHoward
Torch Autocast
:warning: **Habana Mixed Precision is deprecated and will be removed in SynapseAI v1.12.**
Torch Autocast is becoming the default for managing mixed-precision runs.
- Fix autocast for BERT-like models 287 ANSHUMAN87
- Add support for autocast in gradient checkpointing 307 regisss
Improved text-generation example
- Added constrained beam search 281 vivekgoe
- Fix padding error 282 sywangyi
- Various improvements for faster checkpoint downloading 284 286 294 regisss
- Add deepspeed TP policy for llama 303 sywangyi
- Add token and model_revision args for the text-generation example 331 regisss
LoRA examples
Two new LoRA examples for [fine-tuning](https://github.com/huggingface/optimum-habana/tree/main/examples/language-modeling#peft) and [inference](https://github.com/huggingface/optimum-habana/tree/main/examples/text-generation#use-peft-models-for-generation).
- Add lora example for clm and text generation 305 sywangyi
LDM3D
New Stable Diffusion pipeline that enables to generate images and depth maps.
- Support for Ldm3d 304 estelleafl
Added support for Text Generation Inference (TGI)
[TGI](https://github.com/huggingface/text-generation-inference) is now supported on Gaudi.
- Add support for TGI on Gaudi 297 regisss
`GaudiGenerationConfig`
Transformers' `GenerationConfig` has been extended to be fully compatible with Gaudi. It adds two fields to better control generation with static shapes.
- Add GaudiGenerationConfig 293 regisss
Various fixes and improvements
- Fix generation sampling when using `repetition_penalty` 301 sywangyi
- Remove kv cache wa 302 ZhaiFeiyue
- Fix T5 inference performance regression 310 libinta
- Fix gptj HCCL issue occured in DDP 318 sywangyi
- Revert partially Enable/Optimize flan t5 xxl on deepspeed z3 320 hsubramony
- Modify flan-t5 deepspeed configuration 328 yeonsily
- Add commands for gptj and gptneox 325 ankurhabana
- Disable FusedRMSNorm for training 343 hsubramony
- Enable hpu rms fused kernel for t5 344 ZhaiFeiyue
- Remove two workarounds on esmfold 334 bzhu-habana