New features - meta_offloading support: allows offloading meta-data to the CPU hence achieving true n-bit storage on the GPU.
0.1.3
New features - Added CUDA kernels for dequantization (up to 2-3x inference speed-up vs. Pytorch) - Added support for compute_dtype parameter (useful for float32/bfloat16 LoRA training)
0.1.2.post1
Bug fixes - Fixed LoRA adapter loading.
0.1.2
Improvements - Added LoRA support - Added LoRA with fake quantization support (experimental) - Optimizer V2 with scale update support - Some code refactoring in quantize.py
0.1.1.post1
No improvements over v0.1.1. Just removed Pytorch from the dependencies and updated the Readme.
0.1.1
Improvements: - Added Mixtral support for Hugging Face. - Added support for layer-wise custom quantization configs.