Optimum

Latest version: v1.24.0

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 18 of 23

1.2.1

Add support to Python version 3.7 (https://github.com/huggingface/optimum/pull/176)

1.2.0

ORTModel

[`ORTModelForXXX`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort) classes such as [`ORTModelForSequenceClassification`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForSequenceClassification) were integrated with the [Hugging Face Hub](https://hf.co/models) in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the `save_pretrained` and `push_to_hub` methods. An already optimized and / or quantized ONNX model can also be loaded using the [ORTModelForXXX]() classes using the `from_pretrained` method.

Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

python
from optimum.onnxruntime import ORTModelForSequenceClassification

Load model from hub and export it through the ONNX format
model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english",
from_transformers=True
)

Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")

Pipelines

Built-in support for [transformers pipelines](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#pipelines) was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as [ONNX Runtime](https://onnxruntime.ai/).

The currently supported tasks with the default model for each are the following :

* Text Classification ([DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model fine-tuned on SST-2)
* Question Answering ([DistilBERT](https://huggingface.co/distilbert-base-cased-distilled-squad) model fine-tuned on SQuAD v1.1)
* Token Classification([BERT](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english) large fine-tuned on CoNLL2003)
* Feature Extraction ([DistilBERT](https://huggingface.co/distilbert-base-cased))
* Zero Shot Classification ([BART](https://huggingface.co/facebook/bart-large-mnli) model fine-tuned on MNLI)
* Text Generation ([DistilGPT2](https://huggingface.co/distilgpt2))

Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with `transformers` pipeline for `question-answering`.

python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering

load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")

test the model with using transformers pipeline, with handle_impossible_answer for squad_v2
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)

print(prediction)
{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}

Improvements
* Add loss when performing the evalutation step using an instance of `ORTTrainer`, previously not enabled when inference was performed with ONNX Runtime in [152](https://github.com/huggingface/optimum/pull/152)

1.1.2

This patch release fixes a bug where it is possible to initialize processes multiple times in distributed mode, leading to an error.

1.1.1

Habana

- Installation details added for [Optimum-Habana](https://github.com/huggingface/optimum-habana) which provides optimized transformers integration for [Intel's Habana Gaudi Processor (HPU)](https://habana.ai/training/).

ONNX Runtime

- Add the possibility to specify the execution provider in `ORTModel`.
- Add `IncludeFullyConnectedNodes` class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
- Update `QuantizationPreprocessor` so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
- Rename `Seq2SeqORTTrainer` to `ORTSeq2SeqTrainer` for clarity and to keep consistency.
- Add `ORTOptimizer` support for [ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra) models.
- Fix the loading of pretrained `ORTConfig` which contains optimization and quantization config.

1.1.0

ORTTrainer and Seq2SeqORTTrainer

The `ORTTrainer` and `Seq2SeqORTTrainer` are two newly experimental classes.
- Both `ORTTrainer` and `Seq2SeqORTTrainer` were created to have a similar user-facing API as the [`Trainer`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Trainer) and [`Seq2SeqTrainer`](https://huggingface.co/docs/transformers/main_classes/trainer#transformers.Seq2SeqTrainer) of the Transformers library.
- `ORTTrainer` allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.
- `ORTTrainer` allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.
- For `Seq2SeqORTTrainer`, ONNX Runtime inferencing is incompatible with `--predict_with_generate`, as the generate method is not supported yet.

ONNX Runtime optimization and quantization APIs improvements

The `ORTQuantizer` and `ORTOptimizer` classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

- Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the `ORTQuantizer` method `partial_fit`. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
- When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
- The classes `OptimizationConfig`, `QuantizationConfig` and `CalibrationConfig` were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration `ORTConfig`.
- The `QuantizationPreprocessor` class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.

1.0.1

With this release, we enable easy and fast deployment of models from the Transformers library on Habana Gaudi Processors (HPU).

- The class `GaudiTrainer` is built on top of the original class `Trainer` and enables to train and evaluate models from the Transformers library on HPUs.
- The class `GaudiTrainingArguments` is built on top of the original class `TrainingArguments` and adds 3 new arguments:
- `use_habana` to deploy on HPU
- `use_lazy_mode` to use lazy mode instead of eager mode
- `gaudi_config_name` to specify the name of or the path to the Gaudi configuration file
- The class `GaudiConfig` enables to specify a configuration for deployment on HPU, such as the use of Habana Mixed Precision, the use of custom ops,...
- Multi-card deployment is enabled
- Examples are provided for *question answering* and *text classification* in both single- and multi-card settings.
- The following models have been validated:
- BERT base/large
- RoBERTa base/large
- ALBERT large/XXL
- DistilBERT

Page 18 of 23

Releases

Has known vulnerabilities

Previous Next

Optimum

Page 18 of 23

1.2.1

1.2.0

1.1.2

1.1.1

1.1.0

1.0.1

Page 18 of 23

Links

Releases