Extended BetterTransformer support
Various improvements in the PyTorch BetterTransformer integration.
* [BT] add `BetterTransformer` support for ProphetNet by hirotasoshu in https://github.com/huggingface/optimum/pull/923
* Improve bettertransformer benchmark script by fxmarty in https://github.com/huggingface/optimum/pull/939
* Fix sdpa with batch size = 1, better benchmark by fxmarty in https://github.com/huggingface/optimum/pull/915
* Fix slow tests & sdpa dropout by fxmarty in https://github.com/huggingface/optimum/pull/974
* Remove getattr overhead in spda by fxmarty in https://github.com/huggingface/optimum/pull/934
* [`BT`] Improve docs by younesbelkada in https://github.com/huggingface/optimum/pull/944
ONNX merged seq2seq models
Instead of using two separate `decoder_model.onnx` and `decoder_with_past_model.onnx` models, a single decoder can be used for encoder-decoder models: `decoder_model_merged.onnx`. This allows to avoid duplicated weights in the two without/with past ONNX models.
By default, if available, the `decoder_model_merged.onnx` will be used in the ORTModel integration. This can be disabled with the option `--no-post-process` in the ONNX export CLI, and with `use_merged=False` in the `ORTModel.from_pretrained` method.
Example:
optimum-cli export onnx --model t5-small t5_onnx
will give:
└── t5_onnx
├── config.json
├── decoder_model_merged.onnx
├── decoder_model.onnx
├── decoder_with_past_model.onnx
├── encoder_model.onnx
├── generation_config.json
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
And `decoder_model_merged.onnx` is enough to be used for inference. We strongly recommend to inspect the subgraphs with netron to understand what are the inputs/outputs, in case the exported model is to be used with an other engine than ONNX Runtime in the Optimum integration.
* Fix encoder-decoder ONNX merge by fxmarty in https://github.com/huggingface/optimum/pull/924
* Support the merge of decoder without/with past for encoder-decoder models in the ONNX export by fxmarty in https://github.com/huggingface/optimum/pull/926
* Support merged seq2seq models in ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/930
New models in the ONNX export
* Add llama onnx export & onnxruntime support by nenkoru in https://github.com/huggingface/optimum/pull/975
Major bugfix
* Remove constant output in encoder-decoder ONNX models decoder with past by fxmarty in https://github.com/huggingface/optimum/pull/920
* Hash tensor data during deduplication by VikParuchuri in https://github.com/huggingface/optimum/pull/932
Potentially breaking changes
The TasksManager replaces legacy tasks names by the canonical ones used on the Hub and in [transformers metadata](https://huggingface.co/datasets/huggingface/transformers-metadata/blob/main/pipeline_tags.json):
- `sequence-classification` becomes `text-classification`,
- `causal-lm` becomes `text-generation`,
- `seq2seq-lm` becomes `text2text-generation`,
- `speech2seq-lm` and `audio-ctc` becomes `automatic-speech-recognition`,
- `default` becomes `feature-extraction`,
- `masked-lm` becomes `fill-mask`,
- `vision2seq-lm` becomes `image-to-text`
This should not break anything except if you rely on private methods and attributes from `TasksManager`.
* Allow to use a custom class in TasksManager & use canonical tasks names by fxmarty in https://github.com/huggingface/optimum/pull/967
What's Changed
* Update ort trainer to transformers 4.27.2 by JingyaHuang in https://github.com/huggingface/optimum/pull/917
* Compute Loss inside the training step. by AdamLouly in https://github.com/huggingface/optimum/pull/686
* Fix ORTModel MRO for whisper by fxmarty in https://github.com/huggingface/optimum/pull/919
* add ORTStableDiffusionPipeline reference in documentation by echarlaix in https://github.com/huggingface/optimum/pull/890
* Fix decoder ONNX model loading from the Hub by fxmarty in https://github.com/huggingface/optimum/pull/929
* `optimun-cli onnxruntime quantize / optimize` output argument is now required by michaelbenayoun in https://github.com/huggingface/optimum/pull/927
* Register mechanism for the Optimum CLI by michaelbenayoun in https://github.com/huggingface/optimum/pull/928
* Ensure backward compatibility of ORTModel by fxmarty in https://github.com/huggingface/optimum/pull/933
* Update the README by michaelbenayoun in https://github.com/huggingface/optimum/pull/925
* Update README by echarlaix in https://github.com/huggingface/optimum/pull/941
* Update readme by echarlaix in https://github.com/huggingface/optimum/pull/942
* Remove GC from README by michaelbenayoun in https://github.com/huggingface/optimum/pull/943
* Add user and token for CI by michaelbenayoun in https://github.com/huggingface/optimum/pull/945
* Update README by echarlaix in https://github.com/huggingface/optimum/pull/946
* `optimum-cli` print the help of subcommands by michaelbenayoun in https://github.com/huggingface/optimum/pull/940
* Remove from_transformers references from the documentation by fxmarty in https://github.com/huggingface/optimum/pull/935
* Turn command import into optional by JingyaHuang in https://github.com/huggingface/optimum/pull/936
* Auto-set use_merged to False if use_cache is passed as False by fxmarty in https://github.com/huggingface/optimum/pull/954
* Raise error with use_cache=False, use_io_binding=True by fxmarty in https://github.com/huggingface/optimum/pull/955
* Add an ORT training notebook by JingyaHuang in https://github.com/huggingface/optimum/pull/959
* Fix issue with doc build sometimes failing silently in GH workflows by regisss in https://github.com/huggingface/optimum/pull/960
* Fix typos by regisss in https://github.com/huggingface/optimum/pull/963
* Disable tests upon transformers 4.28 release by fxmarty in https://github.com/huggingface/optimum/pull/976
New Contributors
* hirotasoshu made their first contribution in https://github.com/huggingface/optimum/pull/923
* VikParuchuri made their first contribution in https://github.com/huggingface/optimum/pull/932
**Full Changelog**: https://github.com/huggingface/optimum/compare/v1.7.3...v1.8.2