Announcements
* Building ORT from source will require cmake version >=3.24 instead of >=3.18.
General
* [ONNX 1.13](https://github.com/onnx/onnx/releases/tag/v1.13.0) support (opset 18)
* Threading
* ORT Threadpool is now NUMA aware [(details)](https://onnxruntime.ai/docs/performance/tune-performance.html#numa-support-and-performance-tuning)
* New API to set thread affinity ([details](https://onnxruntime.ai/docs/performance/tune-performance.html#set-intra-op-thread-affinity))
* New custom operator APIs
* Enables a custom operator to wrap an entire model that is meant to be inferenced with an external API or runtime.
* [Details](https://onnxruntime.ai/docs/reference/operators/add-custom-op.html#define-and-register-a-custom-operator) and [example](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/test/testdata/custom_op_openvino_wrapper_library)
* Multi-stream Execution Provider refactoring
* Improves GPU utilization by putting parallel inference requests on different GPU streams. Updated for CUDA, TensorRT, and ROCM execution providers
* Improves memory efficiency by enabling GPU memory reuse across different streams
* Enables Execution Provider developer to customize its stream implementation by providing "Stream" interface in ExecutionProvider API
* *[Preview]* [Rust API](https://github.com/microsoft/onnxruntime/tree/main/rust) for ORT - not part of release branch but available to build in main.
Performance
* Support of quantization with AMX on Sapphire Rapids processors
* CUDA EP performance improvements:
* Improve performance of transformer models and decoding methods: beam search, greedy search, and topp sampling.
* Stable Diffusion model optimizations
* Change cudnn_conv_use_max_workspace default value to be 1
* Performance improvements to GRU and Slice operators
Execution Providers
* TensorRT EP
* Adds support for TensorRT 8.5 GA versions
* Bug fixes
* OpenVINO EP
* Adds support for OpenVINO 2022.3
* DirectML EP:
* Updated to DML [1.10.1](https://www.nuget.org/packages/Microsoft.AI.DirectML)
* Additional operators: [NonZero](https://github.com/microsoft/onnxruntime/pull/13768), [Shape](https://github.com/microsoft/onnxruntime/pull/13442), [Size](https://github.com/microsoft/onnxruntime/pull/13442), [Attention](https://github.com/microsoft/onnxruntime/pull/13371), [EmbedLayerNorm](https://github.com/microsoft/onnxruntime/pull/13868), [SkipLayerNorm](https://github.com/microsoft/onnxruntime/pull/13849), [BiasGelu](https://github.com/microsoft/onnxruntime/pull/13795)
* Additional data types: [Abs](https://github.com/microsoft/onnxruntime/pull/13470), [Sign](https://github.com/microsoft/onnxruntime/pull/13470), [Where](https://github.com/microsoft/onnxruntime/pull/13443)
* Enable SetOptimizedFilePath [export/reload](https://github.com/microsoft/onnxruntime/pull/13913)
* Bug fixes/extensions: [allow squeeze-13 axes](https://github.com/microsoft/onnxruntime/pull/13635), [EinSum with MatMul NHCW](https://github.com/microsoft/onnxruntime/pull/13440)
* [ROCm EP](https://onnxruntime.ai/docs/execution-providers/ROCm-ExecutionProvider.html): 5.4 support and GA ready
* *[Preview]* [Azure EP](https://onnxruntime.ai/docs/execution-providers/Azure-ExecutionProvider.html) - supports AzureML hosted models using Triton for hybrid inferencing on-device and on-cloud
Mobile
* Pre/Post processing
* Support updating mobilenet and super resolution models to move the pre and post processing into the model, including usage of custom ops for conversion to/from jpg/png
* [onnxruntime-extensions python package](https://pypi.org/project/onnxruntime-extensions/) includes the model update script to add pre/post processing to the model
* See [example](https://github.com/microsoft/onnxruntime-extensions/blob/main/tutorials/superresolution_e2e.py) model update usage
* *[Coming soon]* onnxruntime-extensions packages for Android and iOS with DecodeImage and EncodeImage custom ops
* Updated the onnxruntime inference examples to demonstrate end-to-end usage with onnxruntime-extensions package
* [SuperResolution model](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/mobile/examples/super_resolution)
* XNNPACK
* Added support for additional commonly used operators
* Add iOS build support
* XNNPACK EP is now included in the onnxruntime-c iOS package
* Added support for using the ORT allocator in XNNPACK kernels to minimize memory usage
Web
* [onnxruntime-extensions](https://github.com/microsoft/onnxruntime-extensions) included in default ort-web build (NLP centric)
* XNNPACK Gemm
* Improved exception handling
* New [utility functions](https://onnxruntime.ai/docs/api/js/index.html) (experimental) to help with exchanging data between images and tensors.
Training
* Performance optimizations and bug fixes for Hugging Face models (i.e. Xlnet and Bloom)
* Stable diffusion optimizations for training, including support for Resize and InstanceNorm gradients and addition of ORT-enabled examples to the [diffusers library](https://github.com/huggingface/diffusers/tree/main/examples/research_projects/onnxruntime)
* FP16 optimizer exposed in torch-ort ([details](https://github.com/microsoft/onnxruntime/blob/main/docs/ORTModule_Training_Guidelines.md#4-use-fp16_optimizer-to-complement-deepspeedapex))
* Bug fixes for Hugging Face models
Known Issues
* The [Microsoft.ML.OnnxRuntime.DirectML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.DirectML) package name includes -dev-* suffix. This is functionally equivalent to the release branch build, and a patch is in progress.
---
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
[snnn](https://github.com/snnn), [skottmckay](https://github.com/skottmckay), [edgchen1](https://github.com/edgchen1), [hariharans29](https://github.com/hariharans29), [tianleiwu](https://github.com/tianleiwu), [yufenglee](https://github.com/yufenglee), [guoyu-wang](https://github.com/guoyu-wang), [yuslepukhin](https://github.com/yuslepukhin), [fs-eire](https://github.com/fs-eire), [pranavsharma](https://github.com/pranavsharma), [iK1D](https://github.com/iK1D), [baijumeswani](https://github.com/baijumeswani), [tracysh](https://github.com/tracysh), [thiagocrepaldi](https://github.com/thiagocrepaldi), [askhade](https://github.com/askhade), [RyanUnderhill](https://github.com/RyanUnderhill), [wangyems](https://github.com/wangyems), [fdwr](https://github.com/fdwr), [RandySheriffH](https://github.com/RandySheriffH), [jywu-msft](https://github.com/jywu-msft), [zhanghuanrong](https://github.com/zhanghuanrong), [smk2007](https://github.com/smk2007), [pengwa](https://github.com/pengwa), [liqunfu](https://github.com/liqunfu), [shahasad](https://github.com/shahasad), [mszhanyi](https://github.com/mszhanyi), [SherlockNoMad](https://github.com/SherlockNoMad), [xadupre](https://github.com/xadupre), [jignparm](https://github.com/jignparm), [HectorSVC](https://github.com/HectorSVC), [ytaous](https://github.com/ytaous), [weixingzhang](https://github.com/weixingzhang), [stevenlix](https://github.com/stevenlix), [tiagoshibata](https://github.com/tiagoshibata), [faxu](https://github.com/faxu), [wschin](https://github.com/wschin), [souptc](https://github.com/souptc), [ashbhandare](https://github.com/ashbhandare), [RandyShuai](https://github.com/RandyShuai), [chilo-ms](https://github.com/chilo-ms), [PeixuanZuo](https://github.com/PeixuanZuo), [cloudhan](https://github.com/cloudhan), [dependabot[bot]](https://github.com/dependabot[bot]), [jeffbloo](https://github.com/jeffbloo), [chenfucn](https://github.com/chenfucn), [linkerzhang](https://github.com/linkerzhang), [duli2012](https://github.com/duli2012), [codemzs](https://github.com/codemzs), [oliviajain](https://github.com/oliviajain), [natke](https://github.com/natke), [YUNQIUGUO](https://github.com/YUNQIUGUO), [Craigacp](https://github.com/Craigacp), [sumitsays](https://github.com/sumitsays), [orilevari](https://github.com/orilevari), [BowenBao](https://github.com/BowenBao), [yangchen-MS](https://github.com/yangchen-MS), [hanbitmyths](https://github.com/hanbitmyths), [satyajandhyala](https://github.com/satyajandhyala), [MaajidKhan](https://github.com/MaajidKhan), [smkarlap](https://github.com/smkarlap), [sfatimar](https://github.com/sfatimar), [jchen351](https://github.com/jchen351), [georgen117](https://github.com/georgen117), [wejoncy](https://github.com/wejoncy), [PatriceVignola](https://github.com/PatriceVignola), [adrianlizarraga](https://github.com/adrianlizarraga), [justinchuby](https://github.com/justinchuby), [zhangxiang1993](https://github.com/zhangxiang1993), [gineshidalgo99](https://github.com/gineshidalgo99), [tlh20](https://github.com/tlh20), [xzhu1900](https://github.com/xzhu1900), [jeffdaily](https://github.com/jeffdaily), [suryasidd](https://github.com/suryasidd), [yihonglyu](https://github.com/yihonglyu), [liuziyue](https://github.com/liuziyue), [chentaMS](https://github.com/chentaMS), [jcwchen](https://github.com/jcwchen), [ybrnathan](https://github.com/ybrnathan), [ajindal1](https://github.com/ajindal1), [zhijxu-MS](https://github.com/zhijxu-MS), [gramalingam](https://github.com/gramalingam), [WilBrady](https://github.com/WilBrady), [garymm](https://github.com/garymm), [kkaranasos](https://github.com/kkaranasos), [ashari4](https://github.com/ashari4), [martinb35](https://github.com/martinb35), [AdamLouly](https://github.com/AdamLouly), [zhangyaobit](https://github.com/zhangyaobit), [vvchernov](https://github.com/vvchernov), [jingyanwangms](https://github.com/jingyanwangms), [wenbingl](https://github.com/wenbingl), [daquexian](https://github.com/daquexian), [sreekanth-yalachigere](https://github.com/sreekanth-yalachigere), [NonStatic2014](https://github.com/NonStatic2014), [mayavijx](https://github.com/mayavijx), [mindest](https://github.com/mindest), [jstoecker](https://github.com/jstoecker), [manashgoswami](https://github.com/manashgoswami), [Andrews548](https://github.com/Andrews548), [baowenlei](https://github.com/baowenlei), [kunal-vaishnavi](https://github.com/kunal-vaishnavi)