- Highlights
- Features
- Bug Fixes
- Examples
- Documentations
**Highlights**
- Support the quantization for Intel® Xeon® Scalable Processors (e.g., Sapphire Rapids), Intel® Data Center GPU Flex Series, and Intel® Max Series CPUs & GPUs
- Provide the new unified APIs for post-training optimizations (static/dynamic quantization) and during-training optimizations (quantization-aware training, pruning/sparsity, distillation, etc.)
- Support the advanced fine-grained auto mixed precisions (AMP) upon all the supported precisions (e.g., INT8, BF16, and FP32)
- Improve the model conversion from PyTorch INT8 model to ONNX INT8 model
- Support the zero-code quantization in Visual Studio Code and JupyterLab with Neural Coder plugins
- Support the quantization for 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)
**Features**
- [Quantization] Experimental Keras model in, quantized Keras model out (commit [4fa753](https://github.com/intel/neural-compressor/commit/4fa75310de6b84fcb1333ecdc0d2ac5302e445bd))
- [Quantization] Support quantization for ITEX v1.0 on Intel CPU and Intel GPU (commit [a2fcb2](https://github.com/intel/neural-compressor/commit/a2fcb29b676f5876729c03d9727ae8298329113f))
- [Quantization] Support hardware-neutral quantized ONNX QDQ models and validate on multiple devices (Intel CPU, NVidia GPU, AMD CPU, and ARM CPU) through ONNX Runtime
- [Quantization] Enhance TensorFlow QAT: remove TFMOT dependency (commit [1deb7d](https://github.com/intel/neural-compressor/commit/1deb7d2f80524714bd0c6c1192842fea9f0e340e))
- [Quantization] Distinguish frameworks, backends and output formats for OnnxRuntime backend (commit [2483a8](https://github.com/intel/neural-compressor/commit/2483a84b7d4aceaee5f4ece134fcb53481f58c6d))
- [Quantization] Support PyTorch/IPEX 1.13 and TensorFlow 2.11 (commit [b7a2ef](https://github.com/intel/neural-compressor/commit/b7a2ef2036f14ec53613cdebca83ff34fd8ae810))
- [AMP] Support more TensorFlow bf16 ops (commit [98d3c8](https://github.com/intel/neural-compressor/commit/98d3c83d11c7f50c86ab7c917e95e17f5d729ad1))
- [AMP] Add torch.amp bf16 support for IPEX backend (commit [2a361b](https://github.com/intel/neural-compressor/commit/2a361b848cd7d1fb39e531d897ef24419fc1de2a))
- [Strategy] Add accuracy-first tuning strategies: MSE_v2 (commit [80311f](https://github.com/intel/neural-compressor/commit/80311f60cdcf4d70ba59b250d60eda0833384ea8)) and HAWQ (commit [83018e](https://github.com/intel/neural-compressor/commit/83018ef28170f8d2659dd30bb28738857e5c0dec)) to solve the accuracy problem of specific models
- [Strategy] Refine the tuning strategy, add more data type, more op attributes like per tensor/per channel, dynamic/static, …etc
- [Pruning] Add progressive pruning and pattern lock pruning_type (commit [f46bb1](https://github.com/intel/neural-compressor/commit/f46bb127ab03e8b487dd668bce8aace202a08477))
- [Pruning] Add per_channel sparse pattern (commit [f46bb1](https://github.com/intel/neural-compressor/commit/f46bb127ab03e8b487dd668bce8aace202a08477))
- [Distillation] Support self-distillation towards efficient and compact neural networks (commit [acdd4c](https://github.com/intel/neural-compressor/commit/acdd4ca9b0ed77b796d7a08fffad25df953a72ca))
- [Distillation] Enhance API of intermediate layers knowledge distillation (commit [3183f6](https://github.com/intel/neural-compressor/commit/3183f68026801d3074ddf4378f30d019be0ff89b))
- [Neural Coder] Detect devices and ISA to adjust the optimization (commit [691d0b](https://github.com/intel/neural-compressor/commit/691d0b870b3d08bbab8b0a799b658d9637e18f17))
- [Neural Coder] Automatically quantize with ONNX Runtime backend (commit [f711b4](https://github.com/intel/neural-compressor/commit/f711b4c92798b5205a8f8665929fb10a89da7e71))
- [Neural Coder] Add Neural Coder Python Launcher (commit [7bb92d](https://github.com/intel/neural-compressor/commit/7bb92d0d05093b18695b192da7ce860ad80d623e))
- [Neural Coder] Add Visual Studio Plugin (commit [dd39ca](https://github.com/intel/neural-compressor/commit/dd39ca035c09604caf2425534a647e7dda9ad045))
- [Productivity] Support Pruning in GUI (commit [d24fea](https://github.com/intel/neural-compressor/commit/d24fea6698c0e238f8c4d8e64a21f7ba77497d8e))
- [Productivity] Use config-driven API to replace yaml
- [Productivity] Export ONNX QLinear to QDQ format (commit [e996a9](https://github.com/intel/neural-compressor/commit/e996a9359fbc24db4925f7bdb8b5529c3f97f283))
- [Productivity] Validate 10K+ transformer-based models including large language models (e.g., T5, GPT, Stable Diffusion, etc.)
**Bug Fixes**
- Fix quantization failed of Onnx models with over 2GB model size (commit [8d83cc](https://github.com/intel/neural-compressor/commit/8d83cc81edfa3712993225ebc7d1e438c07c601d))
- Fix bf16 disabled by default (commit [83825a](https://github.com/intel/neural-compressor/commit/83825afa84c479fca312859156aff0912eb12feb))
- Fix PyTorch DLRM quantization out of memory (commit [ff1725](https://github.com/intel/neural-compressor/commit/ff1725771587a691a9841641cfc24a7fe47ba234))
- Fix ITEX resnetv2_50 tuning accuracy (commit [ae1e05](https://github.com/intel/neural-compressor/commit/ae1e05d94b1b6d82fe59a55be2343bf00f88d9ba))
- Fix bf16 ops error in QAT when torch version < 1.11 (commit [eda8cb](https://github.com/intel/neural-compressor/commit/eda8cb77902df1e8ac9c3fedaa5508334d22533e))
- Fix the key comparison in the Bayesian strategy (commit [1e9c12](https://github.com/intel/neural-compressor/commit/1e9c12bb53d65fbec33e7ada68677100e7617745))
- Fix PyTorch T5 can’t do static quantization (commit [ee3ef0](https://github.com/intel/neural-compressor/commit/ee3ef0e5c28fb324d7fbd0ad4a4d99e3d3d1b84f))
**Examples**
- Add quantization examples of HuggingFace models with OnnxRuntime backend (commit [f4aeb5](https://github.com/intel/neural-compressor/commit/f4aeb5de7f0be514b5d3f6b0447bac4371729324))
- Add Big language model quantization example: GPT-J (commit [01899d](https://github.com/intel/neural-compressor/commit/01899d6d959635612ad24273c092f086dc5c7066))
- Add Distributed Distillation examples: MobileNetV2 (commit [d33ebe](https://github.com/intel/neural-compressor/commit/d33ebe6d8623963e8591e1debf0d30d6b103c838)) and CNN-2 (commit [ebe9e2](https://github.com/intel/neural-compressor/commit/ebe9e2af650d7310a6b28b713e54b711e5f89380))
- Update examples with INC v2.0 new API
- Add Stable Diffusion example
**Documentations**
- Update the accuracy of broad hardware (commit [71b056](https://github.com/intel/neural-compressor/commit/71b056b53015e05b7ab9289d31edaf4200dca628))
- Refine API helper and documents
**Validated Configurations**
- Centos 8.4 & Ubuntu 20.04
- Python 3.7, 3.8, 3.9, 3.10
- TensorFlow 2.9.3, 2.10.1, 2.11.0, ITEX 1.0
- PyTorch/IPEX 1.11.0+cpu, 1.12.1+cpu, 1.13.0+cpu
- ONNX Runtime 1.11.0, 1.12.1, 1.13.1
- MxNet 1.7.0, 1.8.0, 1.9.1