Features
* Quantization
* Support new quantization APIs for Intel TensorFlow
* Support FakeQuant (QDQ) quantization format for ITEX
* Improve INT8 quantization recipes for ONNX Runtime
* Mixed Precision
* Enhance mixed precision interface to support BF16 (FP16) mixed with FP32
* Neural Architecture Search
* Support SuperNet-based neural architecture search (DyNAS)
* Sparsity
* Support training for block-wise structured sparsity
* Strategy
* Support operator-type based tuning strategy
Productivity
* Support light (default) and full binary packages (default package size 0.5MB, full package size 2MB)
* Add experimental accuracy diagnostic feature for INT8 quantization including tensor statistics visualization and fine-grained precision setting
* Add experimental one-click BF16/INT8 low precision enabling & inference optimization, first-ever code-free solution in industry
Ecosystem
* Upstream 4 more quantized models (emotion_ferplus, ultraface, arcfase, bidaf) to ONNX Model Zoo
* Upstream 10 quantized Transformers-based models to HuggingFace Model Hub
Examples
* Add notebooks for Quantization on Intel DevCloud, Distillation/Sparsity/Quantization for BERT-Mini SST-2, and Neural Architecture Search (DyNAS)
* Add more quantization examples from TensorFlow Model Zoo
Validated Configurations
* Python 3.8, 3.9, 3.10
* Centos 8.3 & Ubuntu 18.04 & Win10
* TensorFlow 2.7, 2.8, 2.9
* Intel TensorFlow 2.7, 2.8, 2.9
* PyTorch 1.10.0+cpu, 1.11.0+cpu, 1.12.0+cpu
* IPEX 1.10.0, 1.11.0, 1.12.0
* MxNet 1.6.0, 1.7.0, 1.8.0
* ONNX Runtime 1.9.0, 1.10.0, 1.11.0