Announcements
Starting from the next release(ONNX Runtime 1.16.0), at operating system level we will drop the support for
- iOS 11 and below. iOS 12 will be the minimum supported version.
- CentOS 7, Ubuntu 18.04, and any Linux distro without glibc version >=2.28.
At compiler level we will drop the support for
- GCC version <= 9
- Visual Studio 2019
Also, we will remove the onnxruntime_DISABLE_ABSEIL build option since we will upgrade protobuf and the new protobuf version will need abseil.
General
- [Added support for ONNX Optional type in C API](https://github.com/microsoft/onnxruntime/pull/15314)
- [Added collectives to support multi-GPU inferencing](https://github.com/microsoft/onnxruntime/pull/14399)
- Updated macOS build machines to macOS-12, which comes with Xcode 14.2 and we should stop using Xcode 12.4
- Added Python 3.11 support (deprecate 3.7, support 3.8-3.11) in packages for Onnxruntime CPU, Onnxruntime-GPU, Onnxruntime-directml, and onnxruntime-training.
- Updated to CUDA 11.8. ONNX Runtime source code is still compatible with CUDA 11.4 and 12.x.
- Dropped the support for Windows 8.1 and below
- Eager mode code and onnxruntime_ENABLE_EAGER_MODE cmake option are deleted.
- Upgraded Mimalloc version from 2.0.3 to 2.1.1
- Upgraded protobuf version from 3.18.3 to 21.12
- New dependency: cutlass, which is only used in CUDA/TensorRT packages.
- Upgraded DNNL from 2.7.1 to 3.0
Build System
- On POSIX systems by default we disallow using "root" user to build the code. If needed, you can append "--allow_running_as_root" to your build command to bypass the check.
- Add the support for building the source natively on Windows ARM64 with Visual Studio 2022.
- Added a Gradle wrapper and updated Gradle version from 6.8.3 to 8.0.1. (Gradle is the tool for building ORT Java package)
- When doing cross-compiling, the build scripts will try to download a prebuit protoc from Github instead of building the binary from source. Because now protobuf has many dependencies. It is not easy to setup a build environment for protobuf.
Performance
- [Improved string marshalling and reduce GC pressure](https://github.com/microsoft/onnxruntime/pull/15545)
- [Added a build option to allow using a lock-free queue in threadpool for improved CPU utilization](https://github.com/microsoft/onnxruntime/pull/14834)
- [Fix CPU memory leak due to external weights](https://github.com/microsoft/onnxruntime/pull/15040)
- Added fused decoder multi-head attention kernel to improve GPT and decoder models(like T5, Whisper)
- Added packing mode to improve encoder models with inputs of large padding ratio
- Improved generation algorithm (BeamSearch, TopSampling, GreedySearch)
- Improved performance for StableDiffusion, ViT, GPT, whisper models
Execution Providers
Two new execution providers: JS EP and QNN EP.
TensorRT EP
- Official support for TensorRT 8.6
- Explicit shape profile overrides
- Support for TensorRT plugins via ORT custom op
- Improve support for TensorRT options (heuristics, sparsity, optimization level, auxiliary stream, tactic source selection etc.)
- Support for TensorRT timing cache
- Improvements to our test coverage, specifically for opset16-17 models and package pipeline unit test coverage.
- Other misc bugfixes and improvements.
OpenVINO EP
- Support for OpenVINO 2023.0
- Dynamic shapes support for iGPU
- Changes to OpenVINO backend to improve first inference latency
- Deprecation of HDDL-VADM and Myriad VPU support
- Misc bug fixes.
QNN EP
- [Initial Public preview release](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.QNN)
DirectML EP:
- Updated to [DirectML 1.12](https://github.com/microsoft/DirectML/blob/master/Releases.md#directml-112)
- Opset 16-17 support
AzureEP
- Added support for OpenAI whisper model
- Available in a Nuget pkg in addition to Python
Mobile
New packages
- Swift Package Manager for onnxruntime
- Nuget package for onnxruntime-extensions (supports Android/iOS for MAUI/Xamarin)
- React Native package for onnxruntime can optionally include onnxruntime-extensions
Pre/Post processing
- Added support for built-in pre and post processing for NLP scenarios: classification, question-answering, text-prediction
- Added support for built-in pre and post processing for Speech Recognition (Whisper)
- Added support for built-in post processing for Object Detection (YOLO). Non-max suppression, draw bounding boxes
- Additional CoreML and NNAPI kernels to support customer scenarios
- NNAPI: BatchNormalization, LRN
- CoreML: Div, Flatten, LeakyRelu, LRN, Mul, Pad, Pow, Sub
Web
- [preview] WebGPU support
- Support building the source code with "MinGW make" on Windows.
ORT Training
On-device training:
- Official package for On-Device Training now available. On-device training extends ORT Inference solutions to enable training on edge devices.
- APIs and Language bindings supported for C, C++, Python, C, Java.
- Packages available for Desktop and Android.
- For custom [build](https://onnxruntime.ai/docs/build/training.html#build-for-on-device-training)s refer build instructions.
Others
- Added [graph optimizations]( https://github.com/microsoft/onnxruntime/blob/rel-1.15.0/docs/ORTModule_Training_Guidelines.md#ortmodule_enable_compute_optimizer) which leverage the sparsity in the label data to improve performance. With these optimizations we see performance gains ranging from 4% to 15% for popular HF models over baseline ORT.
- Vision transformer models like ViT, BEIT and SwinV2 see upto 44% speedup with ORT Training+ DeepSpeed over PyTorch eager mode on AzureML.
- Added optimizations for SOTA models like Dolly and Whisper. ORT Training + DS now gives ~17% speedup for Whisper and ~4% speedup for Dolly over PyTorch eager mode. Dolly optimizations on main branch show a ~40% over eager mode.
Known Issues
- The onnxruntime-training 1.15.0 packages published to pypi.org were actually built in Debug mode instead of Release mode. You can get the right one from https://download.onnxruntime.ai/ . We will fix the issue in the next patch release.
- XNNPack EP does not work on x86 CPUs without AVX-512 instructions, because we used wrong alignment when allocating buffers for XNNPack to use.
- The CUDA EP source code has a build error when CUDA version <11.6. See 16000.
- The onnxruntime-training builds are missing the training header files.
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
[snnn](https://github.com/snnn), [fs-eire](https://github.com/fs-eire), [edgchen1](https://github.com/edgchen1), [wejoncy](https://github.com/wejoncy), [mszhanyi](https://github.com/mszhanyi), [PeixuanZuo](https://github.com/PeixuanZuo), [pengwa](https://github.com/pengwa), [jchen351](https://github.com/jchen351), [cloudhan](https://github.com/cloudhan), [tianleiwu](https://github.com/tianleiwu), [PatriceVignola](https://github.com/PatriceVignola), [wangyems](https://github.com/wangyems), [adrianlizarraga](https://github.com/adrianlizarraga), [chenfucn](https://github.com/chenfucn), [HectorSVC](https://github.com/HectorSVC), [baijumeswani](https://github.com/baijumeswani), [justinchuby](https://github.com/justinchuby), [skottmckay](https://github.com/skottmckay), [yuslepukhin](https://github.com/yuslepukhin), [RandyShuai](https://github.com/RandyShuai), [RandySheriffH](https://github.com/RandySheriffH), [natke](https://github.com/natke), [YUNQIUGUO](https://github.com/YUNQIUGUO), [smk2007](https://github.com/smk2007), [jslhcl](https://github.com/jslhcl), [chilo-ms](https://github.com/chilo-ms), [yufenglee](https://github.com/yufenglee), [RyanUnderhill](https://github.com/RyanUnderhill), [hariharans29](https://github.com/hariharans29), [zhanghuanrong](https://github.com/zhanghuanrong), [askhade](https://github.com/askhade), [wschin](https://github.com/wschin), [jywu-msft](https://github.com/jywu-msft), [mindest](https://github.com/mindest), [zhijxu-MS](https://github.com/zhijxu-MS), [dependabot[bot]](https://github.com/dependabot[bot]), [xadupre](https://github.com/xadupre), [liqunfu](https://github.com/liqunfu), [nums11](https://github.com/nums11), [gramalingam](https://github.com/gramalingam), [Craigacp](https://github.com/Craigacp), [fdwr](https://github.com/fdwr), [shalvamist](https://github.com/shalvamist), [jstoecker](https://github.com/jstoecker), [yihonglyu](https://github.com/yihonglyu), [sumitsays](https://github.com/sumitsays), [stevenlix](https://github.com/stevenlix), [iK1D](https://github.com/iK1D), [pranavsharma](https://github.com/pranavsharma), [georgen117](https://github.com/georgen117), [sfatimar](https://github.com/sfatimar), [MaajidKhan](https://github.com/MaajidKhan), [satyajandhyala](https://github.com/satyajandhyala), [faxu](https://github.com/faxu), [jcwchen](https://github.com/jcwchen), [hanbitmyths](https://github.com/hanbitmyths), [jeffbloo](https://github.com/jeffbloo), [souptc](https://github.com/souptc), [ytaous](https://github.com/ytaous) [kunal-vaishnavi](https://github.com/kunal-vaishnavi)