* Use new int8 torch kernels by dacorvo in https://github.com/huggingface/optimum-quanto/pull/222 * Rebuild extension when pytorch is updated by dacorvo in https://github.com/huggingface/optimum-quanto/pull/223 * Use tinygemm bfloat16 / int4 kernel whenever possible by dacorvo in https://github.com/huggingface/optimum-quanto/pull/234 * Add HQQ optimizer by dacorvo in https://github.com/huggingface/optimum-quanto/pull/235 * Add QuantizedModelForCausalLM by dacorvo in https://github.com/huggingface/optimum-quanto/pull/243 * Integrate quanto commands to optimum-cli by dacorvo in https://github.com/huggingface/optimum-quanto/pull/244 * Add pixart-sigma test to image example by dacorvo in https://github.com/huggingface/optimum-quanto/pull/247 * Support diffusion models. by sayakpaul in https://github.com/huggingface/optimum-quanto/pull/255
Bug fixes
* Fix: align extension on max arch by dacorvo in https://github.com/huggingface/optimum-quanto/pull/227 * Fix TinyGemmQBitsTensor move by dacorvo in https://github.com/huggingface/optimum-quanto/pull/246 * Fix stream-lining bug by dacorvo in https://github.com/huggingface/optimum-quanto/pull/249 * Fix float/int8 matrix multiplication latency regression by dacorvo in https://github.com/huggingface/optimum-quanto/pull/250 * Fix serialization issues by dacorvo in https://github.com/huggingface/optimum-quanto/pull/258
New Contributors * sayakpaul made their first contribution in https://github.com/huggingface/optimum-quanto/pull/255
- add OWLv2 detection example by dacorvo, - use new torch quantization kernels by dacorvo.
Bug fixes
- avoid CUDA compilation errors on older Nvidia cards (pre Ampere) by dacorvo, - recompile extensions when pytorch is updated and prevent segfault by dacorvo.
0.2.1
This release does not contain any new feature, but it is the first one with the new package name.
0.2.0
New features
- requantize helper by calmitchell617, - StableDiffusion example by thliang01, - improved linear backward path by dacorvo , - AWQ int4 kernels by dacorvo .
0.1.2
With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of `IncTrainer` which handles the pruning process.