Key Updates
General
* Reduced Operator Kernel build allows ORT binaries to be built with only required operators in the model(s) - [learn more](https://github.com/microsoft/onnxruntime/blob/master/docs/Reduced_Operator_Kernel_build.md)
* **[Preview]** ORT for Mobile Platforms - minimizes build size for mobile and embedded devices - [learn more](https://github.com/microsoft/onnxruntime/blob/master/docs/ONNX_Runtime_for_Mobile_Platforms.md)
* Transformer model inferencing performance optimizations
* Perf improvement for DistilBERT
* Benchmark tool supports more pretrained models
* Improvements in quantization tool
* Support quantization-aware training models
* Make calibration tool to support general preprocessing and calibrate on input
* Simplify the quantization APIs
* Support of model larger than 2G
* New operators for static quantization: QLinearMul, QLinearAdd, QlinearSigmoid and QLinearLeakyRelu
* Prepack constant matrix B for float GEMM (MatMul, Attention)
* Limited Python 3.8 support added in addition to 3.5-3.7 for official Python packages. Not yet supported for Windows GPU and Linux ARM builds.
* Telemetry enabled in Java and NodeJS packages for Windows builds. Note: data is not directly sent to Microsoft or ORT teams by ONNX Runtime; enabling telemetry means trace events are collected by the Windows operating system and may be sent to the cloud based on the user's privacy settings - [learn more](https://github.com/microsoft/onnxruntime/blob/master/docs/Privacy.md).
API
* Python API support for RegisterCustomOpsLibrary
* IO Binding API for C/C++/C language bindings. This allows use of pre-allocated buffers on targeted devices and also target device for unknown output shapes.
* Sharing of allocators between multiple sessions. This allows much better utilization of memory by not creating a separate arena for each session in the same process. See [this](https://github.com/microsoft/onnxruntime/blob/rel-1.5.1/docs/C_API.md) for details.
Windows ML
* NuGet package now supports UWP applications targeting Windows Store deployment (CPU only)
* NuGet package now supports .NET and .NET framework applications
* RUST Developers can now deploy Windows ML – sample and documentation available [here](https://github.com/microsoft/Windows-Machine-Learning/tree/master/Samples/RustSqueezenet)
* New APIs to for additional performance control:
* IntraopNumThreads: Provides an ability to change the number of threads used in the threadpool for Intra Operator Execution for CPU operators through LearningModelSessionOptions.
* SetNamedDimensionOverrides: Provides the ability to override named input dimensions to concrete values through LearningModelSessionOptions in order to achieve better runtime performance.
* Support for additional ONNX format image type denotations – Gray8, normalized [0..1] and normalized [-1..1]
* Reduced Windows ML package size by separating debug symbols into separate distribution package.
Execution Providers
* CUDA updates
* CUDA 10.2 / cuDNN 8.0 in official package
* CUDA 11 support added and available to build from source
* CUDA conv kernel support asymmetrical padding to fully support models such as YoloV3 for improved GPU perf
* TensorRT EP updates
* Support for TensorRT 7.1
* Added TensorRT engine caching feature, turned on by setting env variable ORT_TENSORRT_ENGINE_CACHE_ENABLE=1
* TensorRT builds are now built with the Execution Provider as a separate dll. If enabled in the build, the provider will be available as a shared library. This was previously also enabled for the DNNL EP (ORT 1.3). Other Execution Providers will be added in the future.
* OpenVINO EP updates
* Support for OpenVINO 2020.4
* Added runtime options for VPU hardware to select specific hardware device and enable fast compilation of models.
* Enable C binding support for OpenVINO EP
* DirectML EP updates
* API available for Python ([build from source](https://github.com/microsoft/onnxruntime/blob/v1.5.1/BUILD.md#directml)) and C [Microsoft.ML.OnnxRuntime.DirectML](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime.directml)
* 7 new operators for ONNX 1.7 (opset 12): Celu, GreaterOrEqual, LessOrEqual, ArgMin/Max with select_last_index, GatherND with batch_dim, RoiAlign
* New data integer types were added to existing operators: Clip int, Max int, Min int, MaxPool int8, ReduceMin int8, ReduceMax int8, Pow int exponent
* Higher dimension support 1D to 8D added to these operators: ElementWise*, Activation*, Reduce*, ArgMin/ArgMax, Gather*, Scatter*, OneHot
* 64-bit support for indices on GPU's that support it: Gather, Scatter, OneHot, ArgMax/ArgMin, Cast.
* Android NNAPI EP updates:
* Support for dynamic input shape
* Int32/float32/uint8 data type
* 50% more supported operators (36 total)
* Support for Uint8 static quantization
* Smaller binary size
* Lower memory consumption
* CPU fallback for Android level 26-
* MiGraphX EP updates
* Added ONNX operators: GatherElements, NonZero, Equal, and Where
* Support for Boolean data type
* Improve support for existing operators:
* Asymmetric padding of AveragePool
* Multi-dimensional support for Convolution, Pooling, LRN, and Batchnormalization
* Ceil mode support for AveragePool and MaxPool
* More general approach to check whether constant folding is possible
* Improved graph partitioning logic
Training (RC3 release)
* New and improved API to simplify integration with PyTorch trainer code - [see instructions here](https://github.com/microsoft/onnxruntime-training-examples/tree/master/getting-started)
* Updated CUDA 11 / cuDNN 8.0 support to accelerate in NVIDIA A100
Dependency updates
MacOS binaries now rely on openmp to be installed. See [this](https://github.com/microsoft/onnxruntime/issues/5344#issuecomment-701921165) for reference.
Contributions
Contributors to ONNX Runtime include members across teams at Microsoft, along with our community members:
[gwang-msft](https://github.com/gwang-msft), [snnn](https://github.com/snnn), [skottmckay](https://github.com/skottmckay), [hariharans29](https://github.com/hariharans29), [thiagocrepaldi](https://github.com/thiagocrepaldi), [tianleiwu](https://github.com/tianleiwu), [wangyems](https://github.com/wangyems), [RandySheriffH](https://github.com/RandySheriffH), [yufenglee](https://github.com/yufenglee), [SherlockNoMad](https://github.com/SherlockNoMad), [smk2007](https://github.com/smk2007), [jywu-msft](https://github.com/jywu-msft), [liqunfu](https://github.com/liqunfu), [edgchen1](https://github.com/edgchen1), [yuslepukhin](https://github.com/yuslepukhin), [tiagoshibata](https://github.com/tiagoshibata), [fdwr](https://github.com/fdwr), [ashbhandare](https://github.com/ashbhandare), [iK1D](https://github.com/iK1D), [wschin](https://github.com/wschin), [BowenBao](https://github.com/BowenBao), [zhanghuanrong](https://github.com/zhanghuanrong), [RyanUnderhill](https://github.com/RyanUnderhill), [ryanlai2](https://github.com/ryanlai2), [askhade](https://github.com/askhade), [pranavsharma](https://github.com/pranavsharma), [martinb35](https://github.com/martinb35), [suffiank](https://github.com/suffiank), [ytaous](https://github.com/ytaous), [KeDengMS](https://github.com/KeDengMS), [rayankrish](https://github.com/rayankrish), [natke](https://github.com/natke), [YUNQIUGUO](https://github.com/YUNQIUGUO), [range4life](https://github.com/range4life), [smkarlap](https://github.com/smkarlap), [zhangxiang1993](https://github.com/zhangxiang1993), [xzhu1900](https://github.com/xzhu1900), [codemzs](https://github.com/codemzs), [weixingzhang](https://github.com/weixingzhang), [stevenlix](https://github.com/stevenlix), [tracysh](https://github.com/tracysh), [mosdav](https://github.com/mosdav), [jingyanwangms](https://github.com/jingyanwangms), [tlh20](https://github.com/tlh20), [souptc](https://github.com/souptc), [orilevari](https://github.com/orilevari), [kit1980](https://github.com/kit1980), [yangchen-MS](https://github.com/yangchen-MS), [faxu](https://github.com/faxu), [fs-eire](https://github.com/fs-eire), [wenbingl](https://github.com/wenbingl), [chilo-ms](https://github.com/chilo-ms), [xkszltl](https://github.com/xkszltl), [Andrews548](https://github.com/Andrews548), [yuzawa-san](https://github.com/yuzawa-san), [MaximKalininMS](https://github.com/MaximKalininMS), [jgbradley1](https://github.com/jgbradley1), [nickfeeney](https://github.com/nickfeeney), [zhijxu-MS](https://github.com/zhijxu-MS), [Tixxx](https://github.com/Tixxx), [suryasidd](https://github.com/suryasidd), [Craigacp](https://github.com/Craigacp), [duli2012](https://github.com/duli2012), [jeffbloo](https://github.com/jeffbloo)
orttraining_rc2