This release is fully compatible with [SynapseAI v1.10.0](https://docs.habana.ai/en/v1.10.0/).
- Upgrade to SynapseAI v1.10.0 255 regisss
HPU graphs for training
You can now use HPU graphs for training your models.
- Improve performance and scalability of BERT FT training 200 mlapinski-habana
Check out the [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/accelerate_training#hpu-graphs) for more information.
Various model optimizations
- Update BLOOM modeling for SynapseAI 1.10 277
- Optimize conv1d forward 231 ZhaiFeiyue
- Add static key-value cache for OPT, GPT-J, GPT-NeoX 246 248 249 ZhaiFeiyue
- Optimizations for running FLAN T5 with DeepSpeed ZeRO-3 257 libinta
Asynchronous data copy
You can now enable asynchronous data copy between the host and devices during training using `--non_blocking_data_copy`.
- Enable asynchronous data copy to get a better performance 211 jychen-habana
Check out the [documentation](https://huggingface.co/docs/optimum/habana/usage_guides/accelerate_training#nonblocking-data-copy) for more information.
Profiling
It is now possible to profile your training relying on `GaudiTrainer`. You will need to pass [`--profiling_steps N`](https://huggingface.co/docs/optimum/habana/package_reference/trainer#optimum.habana.GaudiTrainingArguments.profiling_steps) and [`--profiling_warmup_steps K`](https://huggingface.co/docs/optimum/habana/package_reference/trainer#optimum.habana.GaudiTrainingArguments.profiling_warmup_steps).
- Enable profiling 250 ZhaiFeiyue
Adjusted throughput calculation
You can now let the `GaudiTrainer` compute the real throughput of your run (i.e. not counting the time spent while logging, evaluating and saving the model) with `--adjust_throughput`.
- Added an option to remove save checkpoint time from throughput calculation 237 libinta
Check SynapseAI version at import
A check is performed when importing `optimum.habana` to let you know if you are running the version of SynapseAI for which Optimum Habana has been tested.
- Check Synapse version when `optimum.habana` is used 225 regisss
Enhanced examples
Several examples have been added or improved. You can find them [here](https://github.com/huggingface/optimum-habana/tree/main/examples).
- the text-generation example now supports sampling and beam search decoding, and full bf16 generation 218 229 238 251 258 271
- the contrastive image-text example now supports HPU-accelerated data loading 256
- new Seq2Seq QA example 221
- new protein folding example with ESMFold 235 276