This release is fully compatible with the recently released [Transformers v4.28](https://github.com/huggingface/transformers/releases/tag/v4.28.0) and [Diffusers v0.15](https://github.com/huggingface/diffusers/releases/tag/v0.15.0).
- Upgrade to Diffusers 0.15.0 201 regisss - Upgrade to Transformers 4.28 202 regisss
Improved data sampling for training in lazy mode
This release enables to make sure that all batches will have the same size in lazy mode to prevent extra graph compilations.
- Improve data sampling for training in lazy mode 152 regisss
HPU graphs for distributed runs and generation
This release enables HPU graphs for distributed runs and text generation.
- Enable HPU graphs for distributed runs and generation 179 regisss
Recommend `dataloader_num_workers` for CV model training
ViT and Swin examples have been updated to add the `dataloader_num_workers` that enables to speed up training.
- Adding dataloader_num_workers into example command for better performance 188 ZhaiFeiyue
Enable to pipeline forward and backward passes
The argument `pipelining_fwd_bwd` enables to trigger the HPU compution of the forward pass while the CPU interprets the backward pass. This enables to speed up CV models.
- Add mark_step between fwd and bwd for better performance 189 ZhaiFeiyue
More information in [the documentation](https://huggingface.co/docs/optimum/habana/usage_guides/accelerate_training#pipelining-forward-and-backward-passes).
0.2.6
What's Changed
* Add hip support by Disty0 in https://github.com/huggingface/optimum-quanto/pull/330 * Switched linters, black -> ruff by ishandeva in https://github.com/huggingface/optimum-quanto/pull/334 * Add marlin int4 kernel by dacorvo and shcho1118 in https://github.com/huggingface/optimum-quanto/pull/333 * fix: use reshape instead of view by dacorvo in https://github.com/huggingface/optimum-quanto/pull/338 * Support QLayerNorm without weights by dacorvo in https://github.com/huggingface/optimum-quanto/pull/341
New Contributors * ishandeva made their first contribution in https://github.com/huggingface/optimum-quanto/pull/334 * Disty0 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/330 * shcho1118 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/333
- Load and save models from the Hugging Face hub 263 by sayakpaul - Add support for float8 e4f3mnuz 310 (from 281) by maktukmak - Faster and less memory-intensive requantization 290 by latentCall145 - Support torch.equal for QTensor 294 by dacorvo - Add Marlin Float8 kernel 296 (from 241) by fxmarty - Add Whisper for speech recognition example 298 (from 242) by mattiadg - Add ViT classification example 308 by shovan777
Bug fixes
- Fix include patterns in quantize 271 by kaibioinfo - Enable non-strict loading of state dicts 295 by BenjaminBossan - Fix transformers forward error 303 by dacorvo - Fix missing call in transformers models 325 by dacorvo - Fix 8-bit mm calls for 4D inputs 326 by dacorvo
* Use new int8 torch kernels by dacorvo in https://github.com/huggingface/optimum-quanto/pull/222 * Rebuild extension when pytorch is updated by dacorvo in https://github.com/huggingface/optimum-quanto/pull/223 * Use tinygemm bfloat16 / int4 kernel whenever possible by dacorvo in https://github.com/huggingface/optimum-quanto/pull/234 * Add HQQ optimizer by dacorvo in https://github.com/huggingface/optimum-quanto/pull/235 * Add QuantizedModelForCausalLM by dacorvo in https://github.com/huggingface/optimum-quanto/pull/243 * Integrate quanto commands to optimum-cli by dacorvo in https://github.com/huggingface/optimum-quanto/pull/244 * Add pixart-sigma test to image example by dacorvo in https://github.com/huggingface/optimum-quanto/pull/247 * Support diffusion models. by sayakpaul in https://github.com/huggingface/optimum-quanto/pull/255
Bug fixes
* Fix: align extension on max arch by dacorvo in https://github.com/huggingface/optimum-quanto/pull/227 * Fix TinyGemmQBitsTensor move by dacorvo in https://github.com/huggingface/optimum-quanto/pull/246 * Fix stream-lining bug by dacorvo in https://github.com/huggingface/optimum-quanto/pull/249 * Fix float/int8 matrix multiplication latency regression by dacorvo in https://github.com/huggingface/optimum-quanto/pull/250 * Fix serialization issues by dacorvo in https://github.com/huggingface/optimum-quanto/pull/258
New Contributors * sayakpaul made their first contribution in https://github.com/huggingface/optimum-quanto/pull/255
- add OWLv2 detection example by dacorvo, - use new torch quantization kernels by dacorvo.
Bug fixes
- avoid CUDA compilation errors on older Nvidia cards (pre Ampere) by dacorvo, - recompile extensions when pytorch is updated and prevent segfault by dacorvo.