Announcements
* GCC version < 7 is no longer supported
* CMAKE_SYSTEM_PROCESSOR needs be set when cross-compiling on Linux because pytorch cpuinfo was introduced as a dependency for ARM big.LITTLE support. Set it to the value of `uname -m` output of your target device.
General
* ONNX 1.10 support
* opset 15
* ONNX IR 8 (SparseTensor type, model local functionprotos, Optional type not yet fully supported this release)
* Improved documentation of [C/C++ APIs](https://onnxruntime.ai/docs/api/c/)
* IBM Power support
* WinML - DLL dependency fix supports learning models on Windows 8.1
* Support for sub-building [onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions) and statically linking into onnxruntime binary for custom builds
* Add `--_use_extensions` option to run models with custom operators implemented in onnxruntime-extensions
APIs
* Registration of a custom allocator for sharing between multiple sessions. (See RegisterAllocator and UnregisterAllocator APIs in onnxruntime_c_api.h)
* SessionOptionsAppendExecutionProvider_TensorRT API is deprecated; use SessionOptionsAppendExecutionProvider_TensorRT_V2
* New APIs: SessionOptionsAppendExecutionProvider_TensorRT_V2, CreateTensorRTProviderOptions, UpdateTensorRTProviderOptions, GetTensorRTProviderOptionsAsString, ReleaseTensorRTProviderOptions, EnableOrtCustomOps, RegisterAllocator, UnregisterAllocator, IsSparseTensor, CreateSparseTensorAsOrtValue, FillSparseTensorCoo, FillSparseTensorCsr, FillSparseTensorBlockSparse, CreateSparseTensorWithValuesAsOrtValue, UseCooIndices, UseCsrIndices, UseBlockSparseIndices, GetSparseTensorFormat, GetSparseTensorValuesTypeAndShape, GetSparseTensorValues, GetSparseTensorIndicesTypeShape, GetSparseTensorIndices,
Performance and quantization
* Performance improvement on ARM
* Added S8S8 (signed int8, signed int8) matmul kernel. This avoids extending uin8 to int16 for better performance on ARM64 without dot-product instruction
* Expanded GEMM udot kernel to 8x8 accumulator
* Added sgemm and qgemm optimized kernels for ARM64EC
* Operator improvements
* Improved performance for quantized operators: DynamicQuantizeLSTM, QLinearAvgPool
* Added new quantized operator QGemm for quantizing Gemm directly
* Fused HardSigmoid and Conv
* Quantization tool - subgraph support
* Transformers tool improvements
* Fused Attention for BART encoder and Megatron GPT-2
* Integrated mixed precision ONNX conversion and parity test for GPT-2
* Updated graph fusion for embed layer normalization for BERT
* Improved symbolic shape inference for operators: Attention, EmbedLayerNormalization, Einsum and Reciprocal
Packages
* Official ORT GPU packages (except Python) now include both CUDA and TensorRT Execution Providers.
* Python packages will be updated next release. Please note that EPs should be explicitly registered to ensure the correct provider is used.
* GPU packages are built with CUDA 11.4 and should be compatible with 11.x on systems with the minimum required driver version. See: [CUDA minor version compatibility](https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility)
* Pypi
* ORT + DirectML Python packages now available: [onnxruntime-directml](https://pypi.org/project/onnxruntime-directml/)
* GPU package can be used on both CPU-only and GPU machines
* Nuget
* C: Added support for using netstandard2.0 as a target framework
* Windows symbol (PDB) files are no longer included in the Nuget package, reducing size of the binary Nuget package by 85%. To download, please see the artifacts below in Github.
Execution Providers
* CUDA EP
* Framework improvements that boost CUDA performance of subgraph heavy models (8642, 8702)
* Support for sequence ops for improved performance for models using sequence type
* Kernel perf improvements for Pad and Upsample (up to 4.5x faster)
* TensorRT EP
* Added support for TensorRT 8.0 (x64 Windows/Linux, ARM Jetson), which includes new TensorRT explicit-quantization features (ONNX Q/DQ support)
* General fixes and quality improvements
* OpenVINO EP
* Added support for OpenVINO 2021.4
* DirectML EP
* Bug fix for Identity with non-float inputs affecting DynamicQuantizeLinear ONNX backend test
ORT Web
* WebAssembly
* SIMD (Single Instruction, Multiple Data) support
* Option to load WebAssembly from worker thread to avoid blocking main UI thread
* wasm file path override
* WebGL
* Simpler workflow for WebGL kernel implementation
* Improved performance with Conv kernel enhancement
ORT Mobile
* Added more [example mobile apps](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile)
* CoreML and NNAPI EP enhancements
* Reduced peak memory usage when initializing session with ORT format model as bytes
* Enhanced partitioning to improve performance when using NNAPI and CoreML
* Reduce number of NNAPI/CoreML partitions required
* Add ability to force usage of CPU for post-processing in SSD models
* Improves performance by avoiding expensive device copy to/from NPU for cheap post-processing section of the model
* Changed to using xcframework in the iOS package
* Supports usage of arm64 iPhone simulator on Mac with Apple silicon
ORT Training
* Expanding input formats supported to include dictionaries and lists.
* Enable user defined autograd functions
* Support for fallback to PyTorch for execution
* Added support for deterministic compute to enable reproducibility with ORTModule
* Add DebugOptions and LogLevels to ORTModule API* to improve debuggability
* Improvements additions to kernels/gradients: Concat, Split, MatMul, ReluGrad, PadOp, Tile, BatchNormInternal
* Support for ROCm 4.3.1 on AMD GPU
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
[edgchen1](https://github.com/edgchen1), [gwang-msft](https://github.com/gwang-msft), [tianleiwu](https://github.com/tianleiwu), [fs-eire](https://github.com/fs-eire), [hariharans29](https://github.com/hariharans29), [skottmckay](https://github.com/skottmckay), [baijumeswani](https://github.com/baijumeswani), [RyanUnderhill](https://github.com/RyanUnderhill), [iK1D](https://github.com/iK1D), [souptc](https://github.com/souptc), [nkreeger](https://github.com/nkreeger), [liqunfu](https://github.com/liqunfu), [pengwa](https://github.com/pengwa), [SherlockNoMad](https://github.com/SherlockNoMad), [wangyems](https://github.com/wangyems), [chilo-ms](https://github.com/chilo-ms), [thiagocrepaldi](https://github.com/thiagocrepaldi), [KeDengMS](https://github.com/KeDengMS), [suffiank](https://github.com/suffiank), [oliviajain](https://github.com/oliviajain), [chenfucn](https://github.com/chenfucn), [satyajandhyala](https://github.com/satyajandhyala), [yuslepukhin](https://github.com/yuslepukhin), [pranavsharma](https://github.com/pranavsharma), [tracysh](https://github.com/tracysh), [yufenglee](https://github.com/yufenglee), [hanbitmyths](https://github.com/hanbitmyths), [ytaous](https://github.com/ytaous), [YUNQIUGUO](https://github.com/YUNQIUGUO), [zhanghuanrong](https://github.com/zhanghuanrong), [stevenlix](https://github.com/stevenlix), [jywu-msft](https://github.com/jywu-msft), [chandru-r](https://github.com/chandru-r), [duli2012](https://github.com/duli2012), [smk2007](https://github.com/smk2007), [wschin](https://github.com/wschin), [MaajidKhan](https://github.com/MaajidKhan), [tiagoshibata](https://github.com/tiagoshibata), [xadupre](https://github.com/xadupre), [RandySheriffH](https://github.com/RandySheriffH), [ashbhandare](https://github.com/ashbhandare), [georgen117](https://github.com/georgen117), [Tixxx](https://github.com/Tixxx), [harshithapv](https://github.com/harshithapv), [Craigacp](https://github.com/Craigacp), [BowenBao](https://github.com/BowenBao), [askhade](https://github.com/askhade), [zhangxiang1993](https://github.com/zhangxiang1993), [gramalingam](https://github.com/gramalingam), [weixingzhang](https://github.com/weixingzhang), [natke](https://github.com/natke), [tlh20](https://github.com/tlh20), [codemzs](https://github.com/codemzs), [ryanlai2](https://github.com/ryanlai2), [raviskolli](https://github.com/raviskolli), [pranav-prakash](https://github.com/pranav-prakash), [faxu](https://github.com/faxu), [adtsai](https://github.com/adtsai), [fdwr](https://github.com/fdwr), [wenbingl](https://github.com/wenbingl), [jcwchen](https://github.com/jcwchen), [neginraoof](https://github.com/neginraoof), [cschreib-ibex](https://github.com/cschreib-ibex)