Nncf

Latest version: v2.15.0

Safety actively analyzes 723296 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 5

2.5.0

Post-training Quantization:

Features:
- Official release of OpenVINO framework support.
- Ported NNCF OpenVINO backend to use the [nGraph](https://docs.openvino.ai/2021.3/openvino_docs_nGraph_DG_Introduction.html) representation of OpenVINO models.
- Changed dependecies of NNCF OpenVINO backend. It now depends on `openvino` package and not on the `openvino-dev` package.
- Added GRU/LSTM quantization support.
- Added quantizer scales unification.
- Added support for models with 3D and 5D Depthwise convolution.
- Added FP16 OpenVINO models support.
- Added `"overflow_fix"` parameter (for `quantize(...)` & `quantize_with_accuracy_control(...)` methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in [Quantization section](docs/compression_algorithms/Quantization.md).
- (OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
- (OpenVINO) Added Quantization with accuracy control algorithm.
- (OpenVINO) Added YOLOv8 examples for [`quantize(...)`](examples/post_training_quantization/openvino/yolov8) & [`quantize_with_accuracy_control(...)`](examples/post_training_quantization/openvino/yolov8_quantize_with_accuracy_control) methods.
- (PyTorch) Added min-max quantization algorithm as experimental.

Fixes:
- Fixed `ignored_scope` attribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly.
- (ONNX) Checking correct ONNX opset version via the `nncf.quantize(...)`. Now, models with opset < 13 are optimized correctly in per-tensor quantization.

Improvements:
- Added improvements for statistic collection process (collect weights statistics only once).
- (PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.

Known issues:
- `quantize(...)` method can generate inaccurate int8 results for models with the *DenseNet-like* architecture. Use `quantize_with_accuracy_control(...)` in such case.
- `quantize(...)` method can hang on models with *transformer* architecture when `fast_bias_correction` optional parameter is set to *False*. Don't set it to *False* or use `quantize_with_accuracy_control(...)` in such case.
- `quantize(...)` method can generate inaccurate int8 results for models with the *MobileNet-like* architecture on non-VNNI machines.

Compression-aware training:

New Features:
- Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
- Added `nncf.common.utils.patcher.Patcher` - this class can be used to patch methods on live PyTorch model objects with wrappers such as `nncf.torch.dynamic_graph.context.no_nncf_trace` when doing so in the model code is not possible (e.g. if the model comes from an external library package).
- Compression controllers of the `nncf.api.compression.CompressionAlgorithmController` class now have a `.strip()` method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.

Fixes:
- Fixed statistics computation for pruned layers.
- (PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.

Improvements:
- Extension of attributes (`transpose/permute/getitem`) for pruning node selector.
- NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
- Added average pool 3d-like ops to pruning mask.
- Added Conv3d for overflow fix.
- `nncf.set_log_file(...)` can now be used to set location of the NNCF log file.
- (PyTorch) Added support for pruning of `torch.nn.functional.pad` operation.
- (PyTorch) Added `torch.baddbmm` as an alias for the matmul metatype for quantization purposes.
- (PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
- (PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
- (PyTorch) Added `__matmul__` magic functions to the list of patched ops (for SwinTransformer by Microsoft).

Requirements:
- Updated ONNX version (1.13)
- Updated Tensorflow version (2.11)

General changes:
- Added Windows support for NNCF.

2.4.0

Target version updates:
- Bump target framework versions to PyTorch 1.13.1, TensorFlow 2.8.x, ONNX 1.12, ONNXRuntime 1.13.1
- Increased target HuggingFace transformers version for the integration patch to 4.23.1

Features:
- Official release of the ONNX framework support.
NNCF may now be used for post-training quantization (PTQ) on ONNX models.
Added an [example script](examples/post_training_quantization/onnx/mobilenet_v2) demonstrating the ONNX post-training quantization on MobileNetV2.
- Preview release of OpenVINO framework support.
NNCF may now be used for post-training quantization on OpenVINO models. Added an example script demonstrating the OpenVINO post-training quantization on MobileNetV2.
`pip install nncf[openvino]` will install NNCF with the required OV framework dependencies.
- Common post-training quantization API across the supported framework model formats (PyTorch, TensorFlow, ONNX, OpenVINO IR) via the `nncf.quantize(...)` function.
The parameter set of the function is the same for all frameworks - actual framework-specific implementations are being dispatched based on the type of the model object argument.
- (PyTorch, TensorFlow) Improved the adaptive compression training functionality to reduce effective training time.
- (ONNX) Post-processing nodes are now automatically excluded from quantization.
- (PyTorch - Experimental) Joint Pruning, Quantization and Distillation for Transformers enabled for certain models from HuggingFace `transformers` repo.
See [description](nncf/experimental/torch/sparsity/movement/MovementSparsity.md) of the movement pruning involved in the JPQD for details.

Bugfixes:
- Fixed a division by zero if every operation is added to ignored scope
- Improved logging output, cutting down on the number of messages being output to the standard `logging.INFO` log level.
- Fixed FLOPS calculation for linear filters - this impacts existing models that were pruned with a FLOPS target.
- "chunk" and "split" ops are correctly handled during pruning.
- Linear layers may now be pruned by input and output independently.
- Matmul-like operations and subsequent arithmetic operations are now treated as a fused pattern.
- (PyTorch) Fixed a rare condition with accumulator overflow in CUDA quantization kernels, which led to CUDA runtime errors and NaN values appearing in quantized tensors and
- (PyTorch) `transformers` integration patch now allows to export to ONNX during training, and not only at the end of it.
- (PyTorch) `torch.nn.utils.weight_norm` weights are now detected correctly.
- (PyTorch) Exporting a model with sparsity or pruning no longer leads to weights in the original model object in-memory to be hard-set to 0.
- (PyTorch - Experimental) improved automatic search of blocks to skip within the NAS algorithm – overlapping blocks are correctly filtered.
- (PyTorch, TensorFlow) Various bugs and issues with compression training were fixed.
- (TensorFlow) Fixed an error with `"num_bn_adaptation_samples": 0` in config leading to a `TypeError` during quantization algo initialization.
- (ONNX) Temporary model file is no longer saved on disk.
- (ONNX) Depthwise convolutions are now quantizable in per-channel mode.
- (ONNX) Improved the working time of PTQ by optimizing the calls to ONNX shape inferencing.

Breaking changes:
- Fused patterns will be excluded from quantization via `ignored_scopes` only if the top-most node in data flow order matches against `ignored_scopes`
- NNCF config's `"ignored_scopes"` and `"target_scopes"` are now strictly checked to be matching against at least one node in the model graph instead of silently ignoring the unmatched entries.
- Calling `setup.py` directly to install NNCF is deprecated and no longer guaranteed to work.
- Importing NNCF logger as `from nncf.common.utils.logger import logger as nncf_logger` is deprecated - use `from nncf import nncf_logger` instead.
- `pruning_rate` is renamed to `pruning_level` in pruning compression controllers.
- (ONNX) Removed CompressionBuilder. Excluded examples of NNCF for ONNX with CompressionBuilder API

2.3.0

**New features**
- (ONNX) PTQ API support for ONNX.
- (ONNX) Added PTQ examples for ONNX in image classification, object detection, and semantic segmentation.
- (PyTorch) Added `BootstrapNAS` to find high-performing sub-networks from the super-network optimization.

**Bugfixes**
- (PyTorch) Returned the initial quantized model when the retraining failed to find out the best checkpoint.
- (Experimental) Fixed weight initialization for `ONNXGraph` and `MinMaxQuantization`.

2.2.0

**New features**

- Pre-production quality
- (TensorFlow) Added TensorFlow 2.5.x support.
- (TensorFlow) The `SubclassedConverter` class was added to create `NNCFGraph` for the `tf.Graph` Keras model.
- (TensorFlow) Added `TFOpLambda` layer support with `TFModelConverter`, `TFModelTransformer`, and `TFOpLambdaMetatype`.
- (TensorFlow) Patterns from `MatMul` and `Conv2D` to `BiasAdd` and `Metatypes` of TensorFlow operations with weights `TFOpWithWeightsMetatype` are added.
- (PyTorch, TensorFlow) Added prunings for `Reshape` and `Linear` as `ReshapePruningOp` and `LinearPruningOp`.
- (PyTorch) Added mixed precision quantization config with HAWQ for `Resnet50` and `Mobilenet_v2` for the latest VPU.
- (PyTorch) Splitted `NNCFBatchNorm` into `NNCFBatchNorm1d`, `NNCFBatchNorm2d`, `NNCFBatchNorm3d`.
- (PyTorch - Experimental) Added the `BNASTrainingController` and `BNASTrainingAlgorithm` for BootstrapNAS to search the model's architecture.
- (Experimental) ONNX `ModelProto` is now converted to `NNCFGraph` through `GraphConverter`.
- (Experimental) `ONNXOpMetatype` and extended patterns for fusing HW config is now available.
- (Experimental) Added `ONNXPostTrainingQuantization` and `MinMaxQuantization` supports for ONNX.

**Bugfixes**

- (PyTorch, TensorFlow) Added exception handling of BN adaptation for zero sample values.
- (PyTorch, TensorFlow) Fixed learning rate after validation step for `EarlyExitCompressionTrainingLoop`.
- (PyTorch) Fixed `FakeQuantizer` to make exact zeros.
- (PyTorch) Fixed Quantizer misplacements during ONNX export.
- (PyTorch) Restored device information during ONNX export.
- (PyTorch) Fixed the statistics collection from the pruned model.

2.1.0

New features
- (PyTorch) All PyTorch operations are now NNCF-wrapped automatically.
- (TensorFlow) Scales for concat-affecting quantizers are now unified
- (PyTorch) The pruned filters are now set to 0 in the exported ONNX file instead of removing them from the ONNX definition.
- (PyTorch, TensorFlow) Extended accuracy-aware training pipeline with the `early_exit` mode.
- (PyTorch, TensorFlow) Added support for quantization presets to be specified in NNCF config.
- (PyTorch, TensorFlow) Extended pruning statistics displayed to the user.
- (PyTorch, TensorFlow) Users may now register a `dump_checkpoints_fn` callback to control the location of checkpoint saving during accuracy-aware training.
- (PyTorch, TensorFlow) Default pruning schedule is now exponential.
- (PyTorch) SILU activation now supported.
- (PyTorch) Dynamic graph no longer traced during compressed model execution, which improves training performance of models compressed with NNCF.
- (PyTorch) Added BERT-MRPC quantization results and integration instructions to the HuggingFace Transformers integration patch.
- (PyTorch) Knowledge distillation extended with the option to specify temperature for the `softmax` mode.
- (TensorFlow) Added `mixed_min_max` option for quantizer range initialization.
- (PyTorch, TensorFlow) ReLU6-based HSwish and HSigmoid activations are now properly fused.
- (PyTorch - Experimental) Added an algorithm to search the model's architecture for basic building blocks.

Bugfixes:
- (TensorFlow) Fixed a bug where an operation with int32 inputs (following a Cast op) was attempted to be quantized.
- (PyTorch, TensorFlow) LeakyReLU now properly handled during pruning
- (PyTorch) Fixed errors with custom modules failing at the `determine_subtype` stage of metatype assignment.
- (PyTorch) Fix handling modules with `torch.nn.utils.weight_norm.WeightNorm` applied

2.0.2

Target version updates:
- Relax TensorFlow version requirements to 2.4.x

Page 3 of 5

Releases

Has known vulnerabilities

Previous Next

Nncf

Page 3 of 5

2.5.0

2.4.0

2.3.0

2.2.0

2.1.0

2.0.2

Page 3 of 5

Links

Releases