ORTModel
[`ORTModelForXXX`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort) classes such as [`ORTModelForSequenceClassification`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForSequenceClassification) were integrated with the [Hugging Face Hub](https://hf.co/models) in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the `save_pretrained` and `push_to_hub` methods. An already optimized and / or quantized ONNX model can also be loaded using the [ORTModelForXXX]() classes using the `from_pretrained` method.
Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :
python
from optimum.onnxruntime import ORTModelForSequenceClassification
Load model from hub and export it through the ONNX format
model = ORTModelForSequenceClassification.from_pretrained(
"distilbert-base-uncased-finetuned-sst-2-english",
from_transformers=True
)
Save the exported model
model.save_pretrained("a_local_path_for_convert_onnx_model")
Pipelines
Built-in support for [transformers pipelines](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines#pipelines) was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as [ONNX Runtime](https://onnxruntime.ai/).
The currently supported tasks with the default model for each are the following :
* Text Classification ([DistilBERT](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) model fine-tuned on SST-2)
* Question Answering ([DistilBERT](https://huggingface.co/distilbert-base-cased-distilled-squad) model fine-tuned on SQuAD v1.1)
* Token Classification([BERT](https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english) large fine-tuned on CoNLL2003)
* Feature Extraction ([DistilBERT](https://huggingface.co/distilbert-base-cased))
* Zero Shot Classification ([BART](https://huggingface.co/facebook/bart-large-mnli) model fine-tuned on MNLI)
* Text Generation ([DistilGPT2](https://huggingface.co/distilgpt2))
Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with `transformers` pipeline for `question-answering`.
python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForQuestionAnswering
load vanilla transformers and convert to onnx
model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
test the model with using transformers pipeline, with handle_impossible_answer for squad_v2
optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
prediction = optimum_qa(
question="What's my name?", context="My name is Philipp and I live in Nuremberg."
)
print(prediction)
{'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}
Improvements
* Add loss when performing the evalutation step using an instance of `ORTTrainer`, previously not enabled when inference was performed with ONNX Runtime in [152](https://github.com/huggingface/optimum/pull/152)