With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the `IncQuantizedModel` class.
0.1.0
New features
- group-wise quantization, - safe serialization.
0.0.13
New features
- new `QConv2d` quantized module, - official support for `float8` weights.
Bug fixes
- fix `QbitsTensor.to()` that was not moving the inner tensors, - prevent shallow `QTensor` copies when loading weights that do not move inner tensors.
0.0.12
New features
- quanto kernels library (not used for now in quantize).