Optimum

Latest version: v1.23.3

Safety actively analyzes 679296 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 17 of 22

1.3.0

Knowledge distillation

Knowledge distillation was introduced in 8. To perform distillation, an `IncDistiller` must be instantiated with the appropriate configuration.

One-shot optimization

The possibility to combine compression techniques such as pruning, knowledge distillation and quantization aware training in one-shot during training was introduced (7). One-shot optimization is set by default, but can be cancelled by setting the `one_shot_optimization` parameter to `False` when instantiating the `IncOptimizer`.

Seq2Seq models support

Both quantization and pruning can now be applied on Seq2Seq models (14)

1.2.3

* Add the `save_pretrained` method to the `ORTOptimizer` to easily save the resulting quantized and / or pruned model, with its corresponding configuration (needed to load a quantized model) (4)
* Remove the outdated `fit` method as well as the `model` attribute of `IncQuantizer` and `IncPruner` (4)

1.2.2

With this release, we enable Intel [Neural Compressor](https://github.com/intel/neural-compressor) (INC) automatic accuracy-driven tuning strategies for model quantization, in order for users to easily generate quantized model for different quantization approaches (including static, dynamic and aware-training quantization). This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the `IncQuantizedModel` class.
Magnitude pruning is also enabled for a variety of tasks with the introduction of an `IncTrainer` handling the pruning process.

1.2.1

Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)

1.2.0

ORTModel

[`ORTModelForXXX`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort) classes such as [`ORTModelForSequenceClassification`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForSequenceClassification) were integrated with the [Hugging Face Hub](https://hf.co/models) in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the `save_pretrained` and `push_to_hub` methods. An already optimized and / or quantized ONNX model can also be loaded using the [ORTModelForXXX]() classes using the `from_pretrained` method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

python
from optimum.onnxruntime import ORTModelForSequenceClassification

Load model from hub and export it through the ONNX format
model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english",
from_transformers=True
)

Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")


Pipelines

Built-in support for [transformers pipelines](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#pipelines) was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as [ONNX Runtime](https://onnxruntime.ai/).

The currently supported tasks with the default model for each are the following :

* Text Classification ([DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model fine-tuned on SST-2)
* Question Answering ([DistilBERT](https://huggingface.co/distilbert-base-cased-distilled-squad) model fine-tuned on SQuAD v1.1)
* Token Classification([BERT](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english) large fine-tuned on CoNLL2003)
* Feature Extraction ([DistilBERT](https://huggingface.co/distilbert-base-cased))
* Zero Shot Classification ([BART](https://huggingface.co/facebook/bart-large-mnli) model fine-tuned on MNLI)
* Text Generation ([DistilGPT2](https://huggingface.co/distilgpt2))

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with `transformers` pipeline for `question-answering`.

python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering

load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

test the model with using transformers pipeline, with handle_impossible_answer for squad_v2
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)

print(prediction)
{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}


Improvements
* Add loss when performing the evalutation step using an instance of `ORTTrainer`, previously not enabled when inference was performed with ONNX Runtime in [152](https://github.com/huggingface/optimum/pull/152)

1.1.2

This patch release fixes a bug where it is possible to initialize processes multiple times in distributed mode, leading to an error.

Page 17 of 22

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.