Torch FX
The `optimum.fx.optimization` module (232) provides a set of `torch.fx` graph transformations, along with classes and functions to write your own transformations and compose them.
- The `Transformation` and `ReversibleTransformation` represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
- The `compose` utility function enables transformation composition
- Two reversible transformations were added:
- `MergeLinears`: merges linear layers that have the same input
- `ChangeTrueDivToMulByInverse`: changes a division by a static value to a multiplication of its inverse
ORTModelForSeq2SeqLM
[`ORTModelForSeq2SeqLM`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForSeq2SeqLM) (199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.
* When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
* This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.
Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :
python
from optimum.onnxruntime import ORTModelForSeq2SeqLM
Load model from hub and export it through the ONNX format
model = ORTModelForSeq2SeqLM.from_pretrained("t5-small", from_transformers=True)
Save the exported model in the given directory
model.save_pretrained(output_dir)
ORTModelForImageClassification
[`ORTModelForImageClassification`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort#optimum.onnxruntime.ORTModelForImageClassification) (226) allows ONNX Runtime inference for models with an image classification head.
Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :
python
from optimum.onnxruntime import ORTModelForImageClassification
Load model from hub and export it through the ONNX format
model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224", from_transformers=True)
Save the exported model in the given directory
model.save_pretrained(output_dir)
ORTOptimizer
Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (`fp16`) to `OptimizationConfig` (273).
Pipelines
Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:
* Image Classification ([ViT](https://huggingface.co/google/vit-base-patch16-224))
* Text-to-Text Generation ([T5 small](https://huggingface.co/t5-small))
* Summarization ([T5 base](https://huggingface.co/t5-base))
* Translation ([T5 base](https://huggingface.co/t5-base))
Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :
python
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "What a beautiful day !"
pred = onnx_translation(text)
[{'translation_text': "C'est une belle journée !"}]
Breaking change
The [`ORTModelForXXX`](https://huggingface.co/docs/optimum/main/en/onnxruntime/modeling_ort) execution provider default value is now set to `CPUExecutionProvider` (#203). Before, if no execution provider was provided, it was set to `CUDAExecutionProvider` if a gpu was detected, or to `CPUExecutionProvider` otherwise.