* Add hip support by Disty0 in https://github.com/huggingface/optimum-quanto/pull/330 * Switched linters, black -> ruff by ishandeva in https://github.com/huggingface/optimum-quanto/pull/334 * Add marlin int4 kernel by dacorvo and shcho1118 in https://github.com/huggingface/optimum-quanto/pull/333 * fix: use reshape instead of view by dacorvo in https://github.com/huggingface/optimum-quanto/pull/338 * Support QLayerNorm without weights by dacorvo in https://github.com/huggingface/optimum-quanto/pull/341
New Contributors * ishandeva made their first contribution in https://github.com/huggingface/optimum-quanto/pull/334 * Disty0 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/330 * shcho1118 made their first contribution in https://github.com/huggingface/optimum-quanto/pull/333
- Load and save models from the Hugging Face hub 263 by sayakpaul - Add support for float8 e4f3mnuz 310 (from 281) by maktukmak - Faster and less memory-intensive requantization 290 by latentCall145 - Support torch.equal for QTensor 294 by dacorvo - Add Marlin Float8 kernel 296 (from 241) by fxmarty - Add Whisper for speech recognition example 298 (from 242) by mattiadg - Add ViT classification example 308 by shovan777
Bug fixes
- Fix include patterns in quantize 271 by kaibioinfo - Enable non-strict loading of state dicts 295 by BenjaminBossan - Fix transformers forward error 303 by dacorvo - Fix missing call in transformers models 325 by dacorvo - Fix 8-bit mm calls for 4D inputs 326 by dacorvo
* Use new int8 torch kernels by dacorvo in https://github.com/huggingface/optimum-quanto/pull/222 * Rebuild extension when pytorch is updated by dacorvo in https://github.com/huggingface/optimum-quanto/pull/223 * Use tinygemm bfloat16 / int4 kernel whenever possible by dacorvo in https://github.com/huggingface/optimum-quanto/pull/234 * Add HQQ optimizer by dacorvo in https://github.com/huggingface/optimum-quanto/pull/235 * Add QuantizedModelForCausalLM by dacorvo in https://github.com/huggingface/optimum-quanto/pull/243 * Integrate quanto commands to optimum-cli by dacorvo in https://github.com/huggingface/optimum-quanto/pull/244 * Add pixart-sigma test to image example by dacorvo in https://github.com/huggingface/optimum-quanto/pull/247 * Support diffusion models. by sayakpaul in https://github.com/huggingface/optimum-quanto/pull/255
Bug fixes
* Fix: align extension on max arch by dacorvo in https://github.com/huggingface/optimum-quanto/pull/227 * Fix TinyGemmQBitsTensor move by dacorvo in https://github.com/huggingface/optimum-quanto/pull/246 * Fix stream-lining bug by dacorvo in https://github.com/huggingface/optimum-quanto/pull/249 * Fix float/int8 matrix multiplication latency regression by dacorvo in https://github.com/huggingface/optimum-quanto/pull/250 * Fix serialization issues by dacorvo in https://github.com/huggingface/optimum-quanto/pull/258
New Contributors * sayakpaul made their first contribution in https://github.com/huggingface/optimum-quanto/pull/255
- add OWLv2 detection example by dacorvo, - use new torch quantization kernels by dacorvo.
Bug fixes
- avoid CUDA compilation errors on older Nvidia cards (pre Ampere) by dacorvo, - recompile extensions when pytorch is updated and prevent segfault by dacorvo.
0.2.1
This release does not contain any new feature, but it is the first one with the new package name.