- Support static cache compilation without using `HFGenerator` - Fixing various issues related to `torch.compile`
0.2.1
- `HQQLinear.state_dict()` for non-initialized layers. Mainly used in for https://github.com/huggingface/transformers/pull/33141
0.2.0
- Bug fixes - Safetensors support for transformers via https://github.com/huggingface/transformers/pull/33141 - `quant_scale`, `quant_zero` and `offload_meta` are now deprecated. You can still use them with the hqq lib, but you can't use them with the transformers lib
0.1.8
- Add BitBlas backend support - Simpler HQQLinear from weights `HQQLinear.from_weights(W, bias, etc.)` - Fix memory leak while swaping layers for the TorchAO Backend - Add `HQQLinear.unpack()` call
0.1.7.post3
- Enable CPU quantization and runtime - `_load_state_dict` fix - fix `extra_repr` in `HQQLinear` - fix `from_quantized` bugs - fix `|` typing - fix 3-bit `axis=1` slicing bug - add 5/6 bit for testing
0.1.7.post2
- Various bug fixes, especially with `AutoHQQHFModel` and the patching logic, to make it work with any transformers model. - Readme refactoring. - Whisper example.