- Highlights
- Features
- Improvement
- Productivity
- Bug Fixes
- Examples
- Validated Configurations
**Highlights**
- Supported [layer-wise](https://github.com/intel/neural-compressor/blob/master/docs/source/quantization_layer_wise.md) quantization for PyTorch RTN/GPTQ Weight-Only Quantization and ONNX Runtime W8A8 quantization.
- Supported Weight-Only Quantization tuning for ONNX Runtime backend.
- Supported GGML double quant on RTN/GPTQ Weight-Only Quantization with FW extension API
- Supported SmoothQuant of Big Saved Model for TensorFlow Backend.
**Features**
- [Quantization] Support GGML double quant in Weight-Only Quantization for RTN and GPTQ ([05c15a](https://github.com/intel/neural-compressor/commit/05c15a4a66699a32ba30a47efbc457b5abb40983))
- [Quantization] Support Weight-Only Quantization tuning for ONNX Runtime backend ([6d4ea5](https://github.com/intel/neural-compressor/commit/6d4ea5b114d7af4030626702da7b515a3f7771a9), [934ba0](https://github.com/intel/neural-compressor/commit/934ba032386d73ca85e03b891ae05757fe4faac0), [4fcfdf](https://github.com/intel/neural-compressor/commit/4fcfdf8f2960b1e5409d3e5c835d166ec127089b))
- [Quantization] Support SmoothQuant block-wise alpha-tuning ([ee6bc2](https://github.com/intel/neural-compressor/commit/ee6bc284b1e65c37b2f46857da9cd315019b208a))
- [Quantization] Support SmoothQuant of Big Saved Model for TensorFlow Backend ([3b2925](https://github.com/intel/neural-compressor/commit/3b2925263845ea6e38df5b50912fbfd4bd6b85ee), [4f2c35](https://github.com/intel/neural-compressor/commit/4f2c35d1ffda4384245339ddf30f3a3dc0ed1772))
- [Quantization] Support PyTorch layer-wise quantization for GPTQ ([ee5450](https://github.com/intel/neural-compressor/commit/ee5450707bf16cceff9de8048a8fdddf968ca936))
- [Quantization] support PyTorch layer-wise quantization for RTN ([ebd1e2](https://github.com/intel/neural-compressor/commit/ebd1e240d006566e373f2b4854d6f3157ae33c36))
- [Quantization] Support ONNX Runtime layer-wise W8A8 quantization ([6142e4](https://github.com/intel/neural-compressor/commit/6142e48dda5c5023a69df56008a73aad54cfea58), [5d33a5](https://github.com/intel/neural-compressor/commit/5d33a536440ec69e56da2ee3834ed21d6c077862))
- [Common] [Experimental] FW extension API implement ([76b8b3](https://github.com/intel/neural-compressor/commit/76b8b3f718e0533b2be01c87987ba807da2a108a), [8447d7](https://github.com/intel/neural-compressor/commit/8447d7097fa33231b8a6e4a9e26e526d191787de), [258236](https://github.com/intel/neural-compressor/commit/2582361aee60be11517e7b2985293e2c8e75d004))
- [Quantization] [Experimental] FW extension API for PT backend support Weight-Only Quantization ([915018](https://github.com/intel/neural-compressor/commit/9150181bb2ab71201fbdb052fbcaa2aba18a090a), [dc9328](https://github.com/intel/neural-compressor/commit/dc9328c09b243d7df3bccc0a35a8a12feaabb40a))
- [Quantization] [Experimental] FW extension API for TF backend support Keras Quantization ([2627d3](https://github.com/intel/neural-compressor/commit/2627d33b9ff900697184972575969ecc55da8923))
- [Quantization] IPEX 2.1 XPU (CPU+GPU) support ([af0b50](https://github.com/intel/neural-compressor/commit/af0b50fd201c672c141c2c0508cdd4e4ae62684f), [cf847c](https://github.com/intel/neural-compressor/commit/cf847c1abf8f09ce4ae6e21f62aeb5ec7fa2f358))
**Improvement**
- [Quantization] Add use_optimum_format for export_compressed_model in Weight-Only Quantization ([5179da](https://github.com/intel/neural-compressor/commit/5179da10dd6540660f320206723e2651b4d6a292), [0a0644](https://github.com/intel/neural-compressor/commit/0a06448384699aa1e51bd2cd986c226382b9cd94))
- [Quantization] Enhance ONNX Runtime quantization with DirectML EP ([db0fef](https://github.com/intel/neural-compressor/commit/db0fef72137e0ec6a5ffbdc0e470e84ea4cefeb1), [d13183](https://github.com/intel/neural-compressor/commit/d1318320632dea889a3eb5287dea07ca320673aa), [098401](https://github.com/intel/neural-compressor/commit/098401d029e33741cfe42011544bf10e64323e46), [6cad50](https://github.com/intel/neural-compressor/commit/6cad5008cc4e1880ba5d9e01538e4a9dd3be962e))
- [Quantization] Support restore ipex model from json ([c3214c](https://github.com/intel/neural-compressor/commit/c3214c9ec007301896c8a5df587be81fb047b541))
- [Quantization] ONNX Runtime add attr to MatMulNBits ([7057e3](https://github.com/intel/neural-compressor/commit/7057e3b2a3ae1f9911516fb173287e1832f77afd))
- [Quantization] Increase SmoothQuant auto alpha running speed ([173c18](https://github.com/intel/neural-compressor/commit/173c1881ae4b4a04868b66a2253fc44807d63349))
- [Quantization] Add SmoothQuant alpha search space as a config argument ([f9663d](https://github.com/intel/neural-compressor/commit/f9663d0e03867ba28af31b332bc42f614c994fd9))
- [Quantization] Add SmoothQuant weight_clipping as a default_on option ([1f4aec](https://github.com/intel/neural-compressor/commit/1f4aec3c7149d93568106326e98a91933826475e))
- [Quantization] Support SmoothQuant with MinMaxObserver ([45b496](https://github.com/intel/neural-compressor/commit/45b4966d3f4198d0fe799488a4ec4c9b844bc07d))
- [Quantization] Support Weight-Only Quantization with fp16 for PyTorch backend ([d5cb56](https://github.com/intel/neural-compressor/commit/d5cb567f0a86ecec3ef561ee69fc44c0cb6cb248))
- [Quantization] Support trace with dictionary type example_inputs ([afe315](https://github.com/intel/neural-compressor/commit/afe3159c4872db654465602c4b237d28cbbf7610))
- [Quantization] Support falcon Weight-Only Quantization ([595d3a](https://github.com/intel/neural-compressor/commit/595d3a1987c77542cd8b904dd16dc1c95ba9bdc7))
- [Common] Add deprecation decorator in experimental fold ([aeb3ed](https://github.com/intel/neural-compressor/commit/aeb3ed292b30b941bd69ef40dd99b0031b2171e0))
- [Common] Remove 1.x API dependency ([ee617a](https://github.com/intel/neural-compressor/commit/ee617a44aaef3d2c6860d850a384db3c4abe80fa))
- [Mixed Precision] Support PyTorch eager mode BF16 MixedPrecision ([3bfb76](https://github.com/intel/neural-compressor/commit/3bfb76dca9271c48940ca4af26cabfa48f50ca56))
**Productivity**
- Support quantization and benchmark on macOS ([16d6a0](https://github.com/intel/neural-compressor/commit/16d6a06c212fbc8908d443b16a354fdfd6337629))
- Support ONNX Runtime 1.16.0 ([d81732](https://github.com/intel/neural-compressor/commit/d817328dd9040f3896bed71c1bb1c181a3cc805c), [299af9](https://github.com/intel/neural-compressor/commit/299af9a162d1f7c4f2e6d4eaceab69163121bee8), [753783](https://github.com/intel/neural-compressor/commit/753783011d2f0fc71abe331e589aff94f63aeac3))
- Support TensorFlow new API for gnr-base ([8160c7](https://github.com/intel/neural-compressor/commit/8160c71b845d7c76f6ff2d66883c871450f57687))
**Bug Fixes**
- Fix GraphModule object has no attribute bias ([7f53d1](https://github.com/intel/neural-compressor/commit/7f53d16914926cec9cf8e02be1ed3dcffd2ca730))
- Fix ONNX model export issue ([af0aea](https://github.com/intel/neural-compressor/commit/af0aea4b221c188e739fde205e22769f7532ffc1), [eaa57f](https://github.com/intel/neural-compressor/commit/eaa57f69ec0de8e821fe6e3297836b08a3aee3f1))
- Add clip for ONNX Runtime SmoothQuant ([cbb69b](https://github.com/intel/neural-compressor/commit/cbb69bf9c244d371833db1092aec9a66b304b7a4))
- Fix SmoothQuant minmax observer init ([b1db1c](https://github.com/intel/neural-compressor/commit/b1db1cf3d102a1a6234f82493f27ac3362c0e150))
- Fix SmoothQuant issue in get/set_module ([dffcfe](https://github.com/intel/neural-compressor/commit/dffcfe1e581a5bcfdd45c24147e934c180131e9d))
- Align sparsity with block-wise masks in progressive pruning ([fcdc29](https://github.com/intel/neural-compressor/commit/fcdc29ad8105ec3b6fc305dedc2edd6fc9c6cd87))
**Examples**
- Support peft model with SmoothQuant ([5e21b7](https://github.com/intel/neural-compressor/commit/5e21b7057b56a46f9299ee35c18ebf47082d1175))
- Enable two ONNX Runtime examples table-transformer-detection ([550cee](https://github.com/intel/neural-compressor/commit/550cee2f1538db7bb542e6c24c3d599fc1d8ef4d)), BEiT ([7265df](https://github.com/intel/neural-compressor/commit/7265dfee1050979ed6672b7efb9cfd4f9bd86acf))
**Validated Configurations**
- Centos 8.4 & Ubuntu 22.04 & Win10 & MacOS Ventura 13.5
- Python 3.8, 3.9, 3.10, 3.11
- TensorFlow 2.13, 2.14, 2.15
- ITEX 1.2.0, 2.13.0.0, 2.14.0.1
- PyTorch/IPEX 1.13.0+cpu, 2.0.1+cpu, 2.1.0
- ONNX Runtime 1.14.1, 1.15.1, 1.16.3
- MXNet 1.9.1