Nncf

Latest version: v2.10.0

Safety actively analyzes 622330 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

2.10.0

Post-training Quantization:

Features:

- Introduced the subgraph defining functionality for the nncf.IgnoredScope() option.
- Introduced limited support for the batch size of more than 1. MobilenetV2 [PyTorch example](https://github.com/openvinotoolkit/nncf/blob/master/examples/post_training_quantization/torch/mobilenet_v2) was updated with batch support.

Fixes:

- Fixed issue with the nncf.OverflowFix parameter absence in some scenarios.
- Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.
- Fixed issue with the nncf.QuantizationMode parameters combination.
- Fixed MobilenetV2 ([PyTorch](https://github.com/openvinotoolkit/nncf/blob/master/examples/post_training_quantization/torch/mobilenet_v2), [ONNX](https://github.com/openvinotoolkit/nncf/blob/master/examples/post_training_quantization/onnx/mobilenet_v2), [OpenVINO](https://github.com/openvinotoolkit/nncf/blob/master/examples/post_training_quantization/openvino/mobilenet_v2)) examples for the Windows platform.
- (OpenVINO) Fixed [Anomaly Classification example](https://github.com/openvinotoolkit/nncf/blob/master/examples/post_training_quantization/openvino/anomaly_stfpm_quantize_with_accuracy_control) for the Windows platform.
- (PyTorch) Fixed bias shift magnitude calculation for fused layers.
- (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the nncf.quantize_with_accuracy_control() method.
- Improvements:
- OverflowFix, AdvancedSmoothQuantParameters and AdvancedBiasCorrectionParameters were exposed into the nncf.* namespace.
- (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in nncf.compress_weights() method, regardless of model weights precision.
- (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.
- (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.
- (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.

Tutorials:

- [Post-Training Optimization of PhotoMaker Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/photo-maker/photo-maker.ipynb)
- [Post-Training Optimization of Stable Diffusion XL Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/stable-diffusion-xl/stable-diffusion-xl.ipynb)
- [Post-Training Optimization of KerasCV Stable Diffusion Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/stable-diffusion-keras-cv/stable-diffusion-keras-cv.ipynb)
- [Post-Training Optimization of Paint By Example Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/paint-by-example/paint-by-example.ipynb)
- [Post-Training Optimization of aMUSEd Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/amused-lightweight-text-to-image/amused-lightweight-text-to-image.ipynb)
- [Post-Training Optimization of InstantID Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/instant-id/instant-id.ipynb)
- [Post-Training Optimization of LLaVA Next Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llava-next-multimodal-chatbot/llava-next-multimodal-chatbot.ipynb)
- [Post-Training Optimization of AnimateAnyone Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/animate-anyone/animate-anyone.ipynb)
- [Post-Training Optimization of YOLOv8-OBB Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/yolov8-optimization/yolov8-obb.ipynb)
- [Post-Training Optimization of LLM Agent](https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/llm-agent-langchain/llm-agent-langchain.ipynb)

Compression-aware training:

Features:

- (PyTorch) nncf.quantize method now may be used as quantization initialization for Quantization-Aware Training. Added a [Resnet18-based example](https://github.com/openvinotoolkit/nncf/blob/master/examples/quantization_aware_training/torch/resnet18) with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.
- (PyTorch) Introduced extractors for the fused Convolution, Batch-/GroupNorm, and Linear functions.

Fixes:

- (PyTorch) Fixed apply_args_defaults function issue.
- (PyTorch) Fixed dtype handling for the compressed torch.nn.Parameter.
- (PyTorch) Fixed is_shared parameter propagation.

Improvements:

- (PyTorch) Updated command creation behaviour to reduce the number of adapters.
- (PyTorch) Added option to insert point for models that wrapped with replace_modules=False.

Deprecations/Removals:

- (PyTorch) Removed the binarization algorithm.
- NNCF installation via pip install nncf[<framework>] option is now deprecated.

Requirements:

- Updated PyTorch (2.2.1) and CUDA (12.1) versions.
- Updated ONNX (1.16.0) and ONNXRuntime (1.17.1) versions.

**Acknowledgements**

Thanks for contributions from the OpenVINO developer community:
Candyzorua
clinty
UsingtcNower
DaniAffCH

2.9.0

Post-training Quantization:

Features:
- (OpenVINO) Added modified AWQ algorithm for 4-bit data-aware weights compression. This algorithm applied only for patterns `MatMul->Multiply->Matmul`. For that `awq` optional parameter has been added to `nncf.compress_weights()` and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).
- (ONNX) Introduced support for the ONNX backend in the `nncf.quantize_with_accuracy_control()` method. Users can now perform quantization with accuracy control for `onnx.ModelProto`. By leveraging this feature, users can enhance the accuracy of quantized models while minimizing performance impact.
- (ONNX) Added an example based on the YOLOv8n-seg model for demonstrating the usage of quantization with accuracy control for the ONNX backend.
- (PT) Added SmoothQuant algorithm for PyTorch backend in `nncf.quantize()`.
- (OpenVINO) Added [an example](examples/llm_compression/openvino/tiny_llama_find_hyperparams) with the hyperparameters tuning for the TinyLLama model.
- Introduced the `nncf.AdvancedAccuracyRestorerParameters`.
- Introduced the `subset_size` option for the `nncf.compress_weights()`.
- Introduced `TargetDevice.NPU` as the replacement for `TargetDevice.VPU`.
Fixes:
- Fixed API Enums serialization/deserialization issue.
- Fixed issue with required arguments for `revert_operations_to_floating_point_precision` method.
Improvements:
- (ONNX) Aligned statistics collection with OpenVINO and PyTorch backends.
- Extended `nncf.compress_weights()` with Convolution & Embeddings compression in order to reduce memory footprint.
Deprecations/Removals:
- (OpenVINO) Removed outdated examples with `nncf.quantize()` for BERT and YOLOv5 models.
- (OpenVINO) Removed outdated example with `nncf.quantize_with_accuracy_control()` for SSD MobileNetV1 FPN model.
- (PyTorch) Deprecated the `binarization` algorithm.
- Removed Post-training Optimization Tool as OpenVINO backend.
- Removed Dockerfiles.
- `TargetDevice.VPU` was replaced by `TargetDevice.NPU`.
Tutorials:
- [Post-Training Optimization of Stable Diffusion v2 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/236-stable-diffusion-v2/236-stable-diffusion-v2-text-to-image.ipynb)
- [Post-Training Optimization of DeciDiffusion Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/259-decidiffusion-image-generation/259-decidiffusion-image-generation.ipynb)
- [Post-Training Optimization of DepthAnything Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/280-depth-anything/280-depth-anything.ipynb)
- [Post-Training Optimization of Stable Diffusion ControlNet Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/235-controlnet-stable-diffusion/235-controlnet-stable-diffusion.ipynb)

Compression-aware training:

Fixes
- (PyTorch) Fixed issue with `NNCFNetworkInterface.get_clean_shallow_copy` missed arguments.

**Acknowledgements**

Thanks for contributions from the OpenVINO developer community:
AishwaryaDekhane
UsingtcNower
Om-Doiphode

2.8.1

Post-training Quantization:

Bugfixes:
- (Common) Fixed issue with `nncf.compress_weights()` to avoid overflows on 32-bit Windows systems.
- (Common) Fixed performance issue with `nncf.compress_weights()` on LLama models.
- (Common) Fixed `nncf.quantize_with_accuracy_control` pipeline with `tune_hyperparams=True` enabled option.
- (OpenVINO) Fixed issue for stateful LLM models and added state restoring after the inference for it.
- (PyTorch) Fixed issue with `nncf.compress_weights()` for LLM models with the executing `is_floating_point` with tracing.

2.8.0

Post-training Quantization:

Breaking changes:
- `nncf.quantize` signature has been changed to add `mode: Optional[nncf.QuantizationMode] = None` as its 3-rd argument, between the original `calibration_dataset` and `preset` arguments.
- (Common) `nncf.common.quantization.structs.QuantizationMode` has been renamed to `nncf.common.quantization.structs.QuantizationScheme`
General:
- (OpenVINO) Changed default OpenVINO opset from 9 to 13.
Features:
- (OpenVINO) Added 4-bit data-aware weights compression. For that `dataset` optional parameter has been added to `nncf.compress_weights()` and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).
- (PyTorch) Added support for PyTorch models with shared weights and custom PyTorch modules in `nncf.compress_weights()`. The weights compression algorithm for PyTorch models is now based on tracing the model graph. The `dataset` parameter is now required in `nncf.compress_weights()` for the compression of PyTorch models.
- (Common) Renamed the `nncf.CompressWeightsMode.INT8` to `nncf.CompressWeightsMode.INT8_ASYM` and introduce `nncf.CompressWeightsMode.INT8_SYM` that can be efficiently used with dynamic 8-bit quantization of activations.
The original `nncf.CompressWeightsMode.INT8` enum value is now deprecated.
- (OpenVINO) Added support for quantizing the ScaledDotProductAttention operation from OpenVINO opset 13.
- (OpenVINO) Added FP8 quantization support via `nncf.QuantizationMode.FP8_E4M3` and `nncf.QuantizationMode.FP8_E5M2` enum values, invoked via passing one of these values as an optional `mode` argument to `nncf.quantize`. Currently, OpenVINO supports inference of FP8-quantized models in reference mode with no performance benefits and can be used for accuracy projections.
- (Common) Post-training Quantization with Accuracy Control - `nncf.quantize_with_accuracy_control()` has been extended by `restore_mode` optional parameter to revert weights to int8 instead of the original precision.
This parameter helps to reduce the size of the quantized model and improves its performance.
By default, it's disabled and model weights are reverted to the original precision in `nncf.quantize_with_accuracy_control()`.
- (Common) Added an `all_layers: Optional[bool] = None` argument to `nncf.compress_weights` to indicate whether embeddings and last layers of the model should be compressed to a primary precision. This is relevant to 4-bit quantization only.
- (Common) Added a `sensitivity_metric: Optional[nncf.parameters.SensitivityMetric] = None` argument to `nncf.compress_weights` for finer control over the sensitivity metric for assigning quantization precision to layers.
Defaults to weight quantization error if a dataset is not provided for weight compression and to maximum variance of the layers' inputs multiplied by inverted 8-bit quantization noise if a dataset is provided.
By default, the backup precision is assigned for the embeddings and last layers.
Fixes:
- (OpenVINO) Models with embeddings (e.g. `gpt-2`, `stable-diffusion-v1-5`, `stable-diffusion-v2-1`, `opt-6.7b`, `falcon-7b`, `bloomz-7b1`) are now more accurately quantized.
- (PyTorch) `nncf.strip(..., do_copy=True)` now actually returns a deepcopy (stripped) of the model object.
- (PyTorch) Post-hooks can now be set up on operations that return `torch.return_type` (such as `torch.max`).
- (PyTorch) Improved dynamic graph tracing for various tensor operations from `torch` namespace.
- (PyTorch) More robust handling of models with disjoint traced graphs when applying PTQ.
Improvements:
- Reformatted the tutorials section in the top-level `README.md` for better readability.
Deprecations/Removals:
- (Common) The original `nncf.CompressWeightsMode.INT8` enum value is now deprecated.
- (PyTorch) The Git patch for integration with HuggingFace `transformers` repository is marked as deprecated and will be removed in a future release.
Developers are advised to use [optimum-intel](https://github.com/huggingface/optimum-intel) instead.
- Dockerfiles in the NNCF Git repository are deprecated and will be removed in a future release.

2.7.0

Post-training Quantization:

Features:
- (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (`compress_weights(…)` pipeline).
- (OpenVINO) Added support for [IF operation](https://docs.openvino.ai/latest/openvino_docs_ops_infrastructure_If_8.html) quantization.
- (OpenVINO) Added `dump_intermediate_model` parameter support for AccuracyAwareAlgorithm (`quantize_with_accuracy_control(…)` pipeline).
- (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (`quantize_with_tune_hyperparams(…)` pipeline).
- (PyTorch) Post-training Quantization is now supported with `quantize(…)` pipeline and the common implementation of quantization algorithms. Deprecated `create_compressed_model()` method for Post-training Quantization.
- Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for `ModelType.Transformer` scheme.
- `QuantizationPreset.Mixed` was set as the default for `ModelType.Transformer` scheme.
Fixes:
- (OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
- Fixed patterns for `ModelType.Transformer` to align with the [quantization scheme](https://docs.openvino.ai/latest/openvino_docs_OV_UG_lpt.html).
Improvements:
- Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
- (OpenVINO) Optimized WeightsCompression algorithm (`compress_weights(…)` pipeline) execution time for LLM's quantization, added ignored scope support.
- (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (`quantize_with_accuracy_control(…)` pipeline).
- (OpenVINO) Added [extract_ov_subgraph tool](tools/extract_ov_subgraph.py) for large IR subgraph extraction.
- (ONNX) Optimized quantization pipeline (up to 1.15x speed up).
Tutorials:
- [Post-Training Optimization of BLIP Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/233-blip-visual-language-processing)
- [Post-Training Optimization of DeepFloyd IF Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/238-deepfloyd-if)
- [Post-Training Optimization of Grammatical Error Correction Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/214-grammar-correction)
- [Post-Training Optimization of Dolly 2.0 Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/240-dolly-2-instruction-following)
- [Post-Training Optimization of Massively Multilingual Speech Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/255-mms-massively-multilingual-speech)
- [Post-Training Optimization of OneFormer Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/249-oneformer-segmentation)
- [Post-Training Optimization of InstructPix2Pix Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/231-instruct-pix2pix-image-editing)
- [Post-Training Optimization of LLaVA Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/257-llava-multimodal-chatbot)
- [Post-Training Optimization of Latent Consistency Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/263-latent-consistency-models-image-generation)
- [Post-Training Optimization of Distil-Whisper Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/267-distil-whisper-asr)
- [Post-Training Optimization of FastSAM Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/261-fast-segment-anything)
Known issues:
- (ONNX) `quantize(...)` method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the `do_constant_folding=True` option during export from PyTorch to ONNX.

Compression-aware training:

Fixes:
- (PyTorch) Fixed Hessian trace calculation to solve [2155](https://github.com/openvinotoolkit/nncf/issues/2155) issue.
Requirements:
- Updated PyTorch version (2.1.0).
- Updated numpy version (<1.27).
Deprecations/Removals:
- (PyTorch) Removed legacy external quantizer storage names.
- (PyTorch) Removed torch < 2.0 version support.

2.6.0

Post-training Quantization:

Features:
- Added `CPU_SPR` device type support.
- Added quantizers scales unification.
- Added quantization scheme for ReduceSum operation.
- Added new types (ReduceL2, ReduceSum, Maximum) to the ignored scope for `ModelType.Transformer`.
- (OpenVINO) Added SmoothQuant algorithm.
- (OpenVINO) Added ChannelAlignment algorithm.
- (OpenVINO) Added HyperparameterTuner algorithm.
- (PyTorch) Added FastBiasCorrection algorithm support.
- (OpenVINO, ONNX) Added embedding weights quantization.
- (OpenVINO, PyTorch) Added new `compress_weights` method that provides data-free [INT8 weights compression](docs/compression_algorithms/CompressWeights.md).
Fixes:
- Fixed detection of decomposed post-processing in models.
- Multiple fixes (new patterns, bugfixes, etc.) to solve [1936](https://github.com/openvinotoolkit/nncf/issues/1936) issue.
- Fixed model reshaping while quantization to keep original model shape.
- (OpenVINO) Added support for sequential models quanitzation.
- (OpenVINO) Fixed in-place statistics cast to support empty dimensions.
- (OpenVINO, ONNX) Fixed quantization of the MatMul operation with weights rank > 2.
- (OpenVINO, ONNX) Fixed BiasCorrection algorithm to enable [CLIP model quantization](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/228-clip-zero-shot-image-classification).
Improvements:
- Optimized `quantize(…)` pipeline (up to 4.3x speed up in total).
- Optimized `quantize_with_accuracy_control(…)` pipelilne (up to 8x speed up for [122-quantizing-model-with-accuracy-control](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/122-quantizing-model-with-accuracy-control) notebook).
- Optimized general statistics collection (up to 1.2x speed up for ONNX backend).
- Ignored patterns separated from Fused patterns scheme (with multiple patterns addition).
Tutorials:
- [Post-Training Optimization of Segment Anything Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/237-segment-anything).
- [Post-Training Optimization of CLIP Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/228-clip-zero-shot-image-classification).
- [Post-Training Optimization of ImageBind Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/239-image-bind).
- [Post-Training Optimization of Whisper Model](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/227-whisper-subtitles-generation).
- [Post-Training Optimization with accuracy control](https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/122-quantizing-model-with-accuracy-control).

Compression-aware training:

Features:
- Added shape pruning processor for BootstrapNAS algorithm.
- Added KD loss for BootstrapNAS algorithm.
- Added `validate_scopes` parameter for NNCF configuration.
- (PyTorch) Added PyTorch 2.0 support.
- (PyTorch) Added `.strip()` option to API.
- (PyTorch) Enabled bfloat data type for quantization kernels.
- (PyTorch) Quantized models can now be `torch.jit.trace`d without calling `.strip()`.
- (PyTorch) Added support for overridden `forward` instance attribute on model objects passed into `create_compressed_model`.
- (Tensorflow) Added Tensorflow 2.12 support.
Fixes:
- (PyTorch) Fixed padding adjustment issue in the elastic kernel to work with the different active kernel sizes.
- (PyTorch) Fixed the torch graph tracing in the case the tensors belonging to parallel edges are interleaved in the order of the tensor argument.
- (PyTorch) Fixed recurrent nodes matching (LSTM, GRU cells) condition with the strict rule to avoid adding not necessary nodes to the ignored scope.
- (PyTorch) Fixed `torch.jit.script` wrapper so that user-side handling exceptions during `torch.jit.script` invocation do not cause NNCF to be permanently disabled.
- (PyTorch, Tensorflow) Adjusted quantizer propagation algorithm to check if quantizer propagation will result in output quantization.
- (PyTorch) Added redefined `__class__` method for ProxyModule that avoids causing error while calling `.super()` in forward method.
Deprecations/Removals:
- (PyTorch) Removed deprecated `NNCFNetwork.__getattr__`, `NNCFNetwork.get_nncf_wrapped_model` methods.
Requirements:
- Updated PyTorch version (2.0.1).
- Updated Tensorflow version (2.12.0).

Page 1 of 4

Releases

Has known vulnerabilities

Nncf

Page 1 of 4

2.10.0

2.9.0

2.8.1

2.8.0

2.7.0

2.6.0

Page 1 of 4

Links

Releases