What's new * Add repr for QuantizedTransformersModel by imba-tjd in https://github.com/huggingface/optimum-quanto/pull/357 * Bump minimal pytorch version to 2.6 by dacorvo in https://github.com/huggingface/optimum-quanto/pull/373
Bug fixes * [tests] enable testing for xpu (rebased) by dacorvo in https://github.com/huggingface/optimum-quanto/pull/349 * enable qbitstensor test on xpu by dacorvo in https://github.com/huggingface/optimum-quanto/pull/350 * fix(library): only compile CUDA extension on Linux by dacorvo in https://github.com/huggingface/optimum-quanto/pull/365 * Fix error when trying to access `state_dict` after activation quantization by DN6 in https://github.com/huggingface/optimum-quanto/pull/371
New Contributors * imba-tjd made their first contribution in https://github.com/huggingface/optimum-quanto/pull/357 * DN6 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/371
* Add hip support by Disty0 in https://github.com/huggingface/optimum-quanto/pull/330 * Switched linters, black -> ruff by ishandeva in https://github.com/huggingface/optimum-quanto/pull/334 * Add marlin int4 kernel by dacorvo and shcho1118 in https://github.com/huggingface/optimum-quanto/pull/333 * fix: use reshape instead of view by dacorvo in https://github.com/huggingface/optimum-quanto/pull/338 * Support QLayerNorm without weights by dacorvo in https://github.com/huggingface/optimum-quanto/pull/341
New Contributors * ishandeva made their first contribution in https://github.com/huggingface/optimum-quanto/pull/334 * Disty0 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/330 * shcho1118 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/333
- Load and save models from the Hugging Face hub 263 by sayakpaul - Add support for float8 e4f3mnuz 310 (from 281) by maktukmak - Faster and less memory-intensive requantization 290 by latentCall145 - Support torch.equal for QTensor 294 by dacorvo - Add Marlin Float8 kernel 296 (from 241) by fxmarty - Add Whisper for speech recognition example 298 (from 242) by mattiadg - Add ViT classification example 308 by shovan777
Bug fixes
- Fix include patterns in quantize 271 by kaibioinfo - Enable non-strict loading of state dicts 295 by BenjaminBossan - Fix transformers forward error 303 by dacorvo - Fix missing call in transformers models 325 by dacorvo - Fix 8-bit mm calls for 4D inputs 326 by dacorvo
* Use new int8 torch kernels by dacorvo in https://github.com/huggingface/optimum-quanto/pull/222 * Rebuild extension when pytorch is updated by dacorvo in https://github.com/huggingface/optimum-quanto/pull/223 * Use tinygemm bfloat16 / int4 kernel whenever possible by dacorvo in https://github.com/huggingface/optimum-quanto/pull/234 * Add HQQ optimizer by dacorvo in https://github.com/huggingface/optimum-quanto/pull/235 * Add QuantizedModelForCausalLM by dacorvo in https://github.com/huggingface/optimum-quanto/pull/243 * Integrate quanto commands to optimum-cli by dacorvo in https://github.com/huggingface/optimum-quanto/pull/244 * Add pixart-sigma test to image example by dacorvo in https://github.com/huggingface/optimum-quanto/pull/247 * Support diffusion models. by sayakpaul in https://github.com/huggingface/optimum-quanto/pull/255
Bug fixes
* Fix: align extension on max arch by dacorvo in https://github.com/huggingface/optimum-quanto/pull/227 * Fix TinyGemmQBitsTensor move by dacorvo in https://github.com/huggingface/optimum-quanto/pull/246 * Fix stream-lining bug by dacorvo in https://github.com/huggingface/optimum-quanto/pull/249 * Fix float/int8 matrix multiplication latency regression by dacorvo in https://github.com/huggingface/optimum-quanto/pull/250 * Fix serialization issues by dacorvo in https://github.com/huggingface/optimum-quanto/pull/258
New Contributors * sayakpaul made their first contribution in https://github.com/huggingface/optimum-quanto/pull/255
- add OWLv2 detection example by dacorvo, - use new torch quantization kernels by dacorvo.
Bug fixes
- avoid CUDA compilation errors on older Nvidia cards (pre Ampere) by dacorvo, - recompile extensions when pytorch is updated and prevent segfault by dacorvo.