Optimum-intel

Latest version: v1.22.0

Safety actively analyzes 723445 Python packages for vulnerabilities to keep your Python projects secure.

Page 9 of 9

1.5.0

Quantization

* Add `OVQuantizer` enabling OpenVINO NNCF post-training static quantization (50)
* Add `OVTrainer` enabling OpenVINO NNCF quantization aware training (67)
* Add `OVConfig` the configuration which contains the quantization process informations (65)

The quantized model resulting from the `OVQuantizer` and the `OVTrainer` are exported to the OpenVINO IR and can be loaded with the corresponding `OVModelForXxx` to perform inference with OpenVINO Runtime.

OVModel

Add `OVModelForCausalLM` enabling OpenVINO Runtime for models with a causal language modeling head (76)

1.4.0

[OVModel](https://github.com/huggingface/optimum-intel/blob/main/optimum/intel/openvino/modeling_base.py#L57) classes were integrated with the [🤗 Hub](https://hf.co/models) in order to easily export models through the OpenVINO IR, save and load those resulting models, as well as to easily perform inference.

* Add OVModel classes enabling OpenVINO inference 21

Below is an example that downloads a DistilBERT model from the Hub, exports it through the OpenVINO IR and saves it:

python
from optimum.intel.openvino import OVModelForSequenceClassification

model_id = "distilbert-base-uncased-finetuned-sst-2-english"
model = OVModelForSequenceClassification.from_pretrained(model_id, from_transformers=True)
model.save_pretrained(save_directory)

The currently supported model topologies are the following :

* `OVModelForSequenceClassification`
* `OVModelForTokenClassification`
* `OVModelForQuestionAnswering`
* `OVModelForFeatureExtraction`
* `OVModelForMaskedLM`
* `OVModelForImageClassification`
* `OVModelForSeq2SeqLM`

Pipelines
The Transformers [pipelines](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#pipelines) support was added, providing an easy way to use OVModels for inference.

diff
-from transformers import AutoModelForSeq2SeqLM
+from optimum.intel.openvino import OVModelForSeq2SeqLM
from transformers import AutoTokenizer, pipeline

model_id = "Helsinki-NLP/opus-mt-en-fr"
-model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
+model = OVModelForSeq2SeqLM.from_pretrained(model_id, from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipe = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "He never went out without a book under his arm, and he often came back with two."
outputs = pipe(text)

By default, OVModels support dynamic shapes enabling inputs of every shapes (without any constraint on the batch size or sequence length). To decrease latency, static shapes can be enabled by giving the desired inputs shapes.

* Add OVModel static shapes 41

python
model.reshape(1, 20)

FP16 precision can also be enabled.

* Add OVModel fp16 support 45

python
model.half()

1.3.1

* Adapt INC configuration and quantized model loading for transformers release 4.22 27
* Fix loss computation when distillation is activated while the weights corresponding to the distillation loss is set to 0 26

1.3.0

Knowledge distillation

Knowledge distillation was introduced in 8. To perform distillation, an `IncDistiller` must be instantiated with the appropriate configuration.

One-shot optimization

The possibility to combine compression techniques such as pruning, knowledge distillation and quantization aware training in one-shot during training was introduced (7). One-shot optimization is set by default, but can be cancelled by setting the `one_shot_optimization` parameter to `False` when instantiating the `IncOptimizer`.

Seq2Seq models support

Both quantization and pruning can now be applied on Seq2Seq models (14)

1.2.3

* Add the `save_pretrained` method to the `ORTOptimizer` to easily save the resulting quantized and / or pruned model, with its corresponding configuration (needed to load a quantized model) (4)
* Remove the outdated `fit` method as well as the `model` attribute of `IncQuantizer` and `IncPruner` (4)

1.2.2

With this release, we enable Intel [Neural Compressor](https://github.com/intel/neural-compressor) (INC) automatic accuracy-driven tuning strategies for model quantization, in order for users to easily generate quantized model for different quantization approaches (including static, dynamic and aware-training quantization). This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the `IncQuantizedModel` class.
Magnitude pruning is also enabled for a variety of tasks with the introduction of an `IncTrainer` handling the pruning process.

Page 9 of 9

Releases

Has known vulnerabilities

Optimum-intel

Page 9 of 9

1.5.0

1.4.0

1.3.1

1.3.0

1.2.3

1.2.2

Page 9 of 9

Links

Releases