* GCC version < 7 is no longer supported
* CMAKE_SYSTEM_PROCESSOR needs be set when cross-compiling on Linux because pytorch cpuinfo was introduced as a dependency for ARM big.LITTLE support. Set it to the value of `uname -m` output of your target device.
* ONNX 1.10 support
* opset 15
* ONNX IR 8 (SparseTensor type, model local functionprotos, Optional type not yet fully supported this release)
* Improved documentation of [C/C++ APIs](
* IBM Power support
* WinML - DLL dependency fix supports learning models on Windows 8.1
* Support for sub-building [onnxruntime-extensions]( and statically linking into onnxruntime binary for custom builds
* Add `--_use_extensions` option to run models with custom operators implemented in onnxruntime-extensions
* Registration of a custom allocator for sharing between multiple sessions. (See RegisterAllocator and UnregisterAllocator APIs in onnxruntime_c_api.h)
* SessionOptionsAppendExecutionProvider_TensorRT API is deprecated; use SessionOptionsAppendExecutionProvider_TensorRT_V2
* New APIs: SessionOptionsAppendExecutionProvider_TensorRT_V2, CreateTensorRTProviderOptions, UpdateTensorRTProviderOptions, GetTensorRTProviderOptionsAsString, ReleaseTensorRTProviderOptions, EnableOrtCustomOps, RegisterAllocator, UnregisterAllocator, IsSparseTensor, CreateSparseTensorAsOrtValue, FillSparseTensorCoo, FillSparseTensorCsr, FillSparseTensorBlockSparse, CreateSparseTensorWithValuesAsOrtValue, UseCooIndices, UseCsrIndices, UseBlockSparseIndices, GetSparseTensorFormat, GetSparseTensorValuesTypeAndShape, GetSparseTensorValues, GetSparseTensorIndicesTypeShape, GetSparseTensorIndices,
Performance and quantization
* Performance improvement on ARM
* Added S8S8 (signed int8, signed int8) matmul kernel. This avoids extending uin8 to int16 for better performance on ARM64 without dot-product instruction
* Expanded GEMM udot kernel to 8x8 accumulator
* Added sgemm and qgemm optimized kernels for ARM64EC
* Operator improvements
* Improved performance for quantized operators: DynamicQuantizeLSTM, QLinearAvgPool
* Added new quantized operator QGemm for quantizing Gemm directly
* Fused HardSigmoid and Conv
* Quantization tool - subgraph support
* Transformers tool improvements
* Fused Attention for BART encoder and Megatron GPT-2
* Integrated mixed precision ONNX conversion and parity test for GPT-2
* Updated graph fusion for embed layer normalization for BERT
* Improved symbolic shape inference for operators: Attention, EmbedLayerNormalization, Einsum and Reciprocal
* Official ORT GPU packages (except Python) now include both CUDA and TensorRT Execution Providers.
* Python packages will be updated next release. Please note that EPs should be explicitly registered to ensure the correct provider is used.
* GPU packages are built with CUDA 11.4 and should be compatible with 11.x on systems with the minimum required driver version. See: [CUDA minor version compatibility](
* Pypi
* ORT + DirectML Python packages now available: [onnxruntime-directml](
* GPU package can be used on both CPU-only and GPU machines
* Nuget
* C: Added support for using netstandard2.0 as a target framework
* Windows symbol (PDB) files are no longer included in the Nuget package, reducing size of the binary Nuget package by 85%. To download, please see the artifacts below in Github.
Execution Providers
* Framework improvements that boost CUDA performance of subgraph heavy models (8642, 8702)
* Support for sequence ops for improved performance for models using sequence type
* Kernel perf improvements for Pad and Upsample (up to 4.5x faster)
* TensorRT EP
* Added support for TensorRT 8.0 (x64 Windows/Linux, ARM Jetson), which includes new TensorRT explicit-quantization features (ONNX Q/DQ support)
* General fixes and quality improvements
* Added support for OpenVINO 2021.4
* DirectML EP
* Bug fix for Identity with non-float inputs affecting DynamicQuantizeLinear ONNX backend test
* WebAssembly
* SIMD (Single Instruction, Multiple Data) support
* Option to load WebAssembly from worker thread to avoid blocking main UI thread
* wasm file path override
* WebGL
* Simpler workflow for WebGL kernel implementation
* Improved performance with Conv kernel enhancement
ORT Mobile
* Added more [example mobile apps](
* CoreML and NNAPI EP enhancements
* Reduced peak memory usage when initializing session with ORT format model as bytes
* Enhanced partitioning to improve performance when using NNAPI and CoreML
* Reduce number of NNAPI/CoreML partitions required
* Add ability to force usage of CPU for post-processing in SSD models
* Improves performance by avoiding expensive device copy to/from NPU for cheap post-processing section of the model
* Changed to using xcframework in the iOS package
* Supports usage of arm64 iPhone simulator on Mac with Apple silicon
ORT Training
* Expanding input formats supported to include dictionaries and lists.
* Enable user defined autograd functions
* Support for fallback to PyTorch for execution
* Added support for deterministic compute to enable reproducibility with ORTModule
* Add DebugOptions and LogLevels to ORTModule API* to improve debuggability
* Improvements additions to kernels/gradients: Concat, Split, MatMul, ReluGrad, PadOp, Tile, BatchNormInternal
* Support for ROCm 4.3.1 on AMD GPU
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
[edgchen1](, [gwang-msft](, [tianleiwu](, [fs-eire](, [hariharans29](, [skottmckay](, [baijumeswani](, [RyanUnderhill](, [iK1D](, [souptc](, [nkreeger](, [liqunfu](, [pengwa](, [SherlockNoMad](, [wangyems](, [chilo-ms](, [thiagocrepaldi](, [KeDengMS](, [suffiank](, [oliviajain](, [chenfucn](, [satyajandhyala](, [yuslepukhin](, [pranavsharma](, [tracysh](, [yufenglee](, [hanbitmyths](, [ytaous](, [YUNQIUGUO](, [zhanghuanrong](, [stevenlix](, [jywu-msft](, [chandru-r](, [duli2012](, [smk2007](, [wschin](, [MaajidKhan](, [tiagoshibata](, [xadupre](, [RandySheriffH](, [ashbhandare](, [georgen117](, [Tixxx](, [harshithapv](, [Craigacp](, [BowenBao](, [askhade](, [zhangxiang1993](, [gramalingam](, [weixingzhang](, [natke](, [tlh20](, [codemzs](, [ryanlai2](, [raviskolli](, [pranav-prakash](, [faxu](, [adtsai](, [fdwr](, [wenbingl](, [jcwchen](, [neginraoof](, [cschreib-ibex](