Onednn

Latest version: v2025.1.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 8 of 27

2.7rc

This is a release candidate for oneDNN v2.7. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).

Performance Optimizations
* Intel Architecture Processors
* Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids).
* Introduced performance optimizations for [bf16 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on Intel Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel AMX instructions in computations on fp32 data.
* Intel Graphics Products
* Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
* Introduced performance optimizations for [tf32 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on future Xe Architecture graphics (code name Ponte Vecchio). The tf32 math mode allows oneDNN to use tf32 arithmetic in computations on fp32 data.
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M)
* AArch64-based Processors
* Improved convolution and binary primitive performance for processors with SVE 512 support.
* Improved eltwise and shuffle primitives performance for processors with SVE 256 and SVE 128 support.
* Improved PReLU, batch normalization, and pooling primitives performance via Compute Library for the Arm Architecture (ACL).
* Improved performance of inner product, matmul, convolution, and batch norm primitives with post-ops via ACL.
* PowerPC64-based Processors
* Introduced performance optimizations for int8 and bfloat16 GEMM.
Functionality
* Introduced runtime output scales support in all primitives.
* Introduced scales support in concat primitive.
* Extended [floating point math mode API](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) with tf32 data type option.
* Extended eltwise primitive with support for `hardsigmoid` algorithm.
* Extended layer normalization primitive with support for mixed source and destination data types.
* Extended depthwise post-op with support for arbitrary padding size. The implementation is available only on Intel processors.
* Added limited fp64 data type support in convolution primitive. Optimized implementation is available for future Xe Architecture graphics (code name Ponte Vecchio).
* Extended int8 convolution and deconvolution implementations on GPUs with arbitrary destination data type support.
* Extended batch normalization primitive with `dnnl_fuse_norm_add_relu` flag that allows to fuse sum and relu operations. The implementation is available for Intel GPUs.
* Extended GPU deconvolution primitive implementation with support for output scales and zero points.
* Introduced threadpool threading support for AArch64-based processors.
* Introduced Unified Shared Memory (USM) support for SYCL backend on NVIDIA GPUs.
* Introduced initial support for AMD GPUs via MIOpen library. Supported primitives include Local Response Normalization (LRN), softmax, and eltwise.
Usability
* Introduced annotations for JIT kernels to allow profilers like Linux perf to correctly label JIT code.
* Extended verbose logs converter with RNN primitive support.
* Added verbose output for `dnnl_*gemm*` calls.
* Removed Level Zero headers from the list of build time dependencies.
* Adjusted NVIDIA GPU implementation to comply with oneDNN numerical behavior. Implicit downconvert to fp16 and tf32 are now managed via [math mode API](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html).

Validation
* Added benchdnn driver for validation of internal BRGEMM implementation.
* Improved benchdnn reference implementation performance with threadpool threading model.
* Extended benchdnn performance benchmarking capabilities on GPU with device-side performance measurement mode (`mode=po`).

Deprecated Functionality
* Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in the future releases.
* Static output scales are deprecated and will be removed in the next release.
* Convolution Winograd algorithm implementation for int8 data type is deprecated and will be removed in the next release.

Breaking Changes
* Changed formula for AUGRU RNN cell to align with Tensorflow. See [proposal](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20211025-augru/augru-v2.md) for details.

Thanks to the Contributors
This release contains contributions from the project core team as well as Aidan Belton AidanBeltonS, akshatasangelkar, Alex Bojan lb991, Crefeda Rodrigues cfRod, Damian Szwichtenberg dszwicht, Diana Bite diaena, Divakar Mariyanna bmdivakar, Emilio Cota cota, Gordon Fossum austinpagan, Hugh Delaney hdelan, Jacek Czaja jczaja, jakpiase, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, Kotha Sowmya Sowmyakotha1999, Louie Tsai louie-tsai, Mark Ryan markdryan, MITSUNARI Shigeo herumi, Mona Minakshi monaminakshi, NaNAGISaSA, Nathan John Sircombe nSircombe, Peter Caday petercad, pgorlani, Sreekanth Yalachigere sreekanth-yalachigere, Tadej Ciglarič t4c1, and Thiago Macieira thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

2.6.3

This is a patch release containing the following changes to v2.6.2:
* Fixed potential integer overflow in BRGEMM-based convolution implementation (deb5595a0f96b54f9106cb846e6fc4e0af49aadf)
* Fixed a defect with incorrect caching of BRGEMM-based matmul primitive implementations with trivial dimensions (305bed526492f2400a1a7fdfcb54b0ee41adc67e)
* Extended benchdnn performance benchmarking capabilities on GPU with device-side performance measurement mode (ba8632592018070a46e4d349bbe3628756022c15)
* Fixed segfault in pooling primitive on CPUs (689d874bbf0a3e1bdc75e99ad2453e6aac9cfe84)

graph-v0.7
This is the Beta Update release for oneDNN Graph API based on [oneDNN v2.7 release](https://github.com/oneapi-src/oneDNN/releases/tag/v2.7).

Functionality
* Added operations `Select`, `LogicalAnd`, `LogicalOr`, `LogicalXor`, `LogicalNot`, `Greater`, `GreaterEqual`, `Equal`, `NoeEqual`, `Less`, and `LessEqual`.
* Added `boolean` data type to support logical operations.
* Added support for passing compilation context to the compile API. This feature allows passing additional information, like tensor shape context, for the backend to generate better kernel code.
* Introduced convolution block fusion via oneDNN Graph Compiler.
* **Experimental**: Introduced dynamic shapes support for multi-level perceptron (MLP) block via oneDNN Graph Compiler.

Known Issues and Limitations
* The weight’s opaque layout can be queried only from a compiled partition, which requires that input tensor shapes must be known at compilation time.
* MHA and MLP fusion are not activated on machines without Intel AVX-512 support.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

graph-v0.6
This is the Beta release for oneDNN Graph based on [oneDNN v2.7 release](https://github.com/oneapi-src/oneDNN/releases/tag/v2.7).

Functionality
* Introduced FP32, BF16, FP16, and INT8 inference support on GPU.
* Introduced FP32 and BF16 training support on GPU.
* Introduced support for floating point math mode at graph construction phase. The mode allows the implementation to use low precision datatype for computations when possible.
* Added `graph::finalize()` function to indicate that the user has finished adding operations into the graph and the graph is ready for partitioning.
* Added operations `AbsBackprop`, `Mish`, `MishBackprop`, and `LeakyReLU`.
* Updated API and operation definitions to comply with [oneDNN Graph Specification 1.0-beta](https://spec.oneapi.io/onednn-graph/v1.0-beta/index.html).

Usability
* Integrated Graph component headers, source and build system into oneDNN:
* Headers moved to `include/oneapi/dnnl`.
* Source moved to `src/graph`.
* Graph functionality is included into single shared object or dynamic library produced by the build system.
* Aligned API with oneDNN:
* Shared common `dnnl::engine` and `dnnl::stream`. The original `dnnl::graph::engine` and `dnnl::graph::stream` API were removed.
* Added a new `make_engine_with_allocator()` API to create `dnnl::engine` with `dnnl::graph::allocator`.
* A few common basic types were shared between oneDNN and oneDNN Graph, including `dnnl_status_t`, `dnnl_data_type_t`, and `dnnl_dims_t`, etc.
* Introduced `ONEDNN_BUILD_GRAPH` build option to manage Graph component build.

Validation
* Introduced `ONEDNN_GRAPH_DUMP` environment variable that serialized library graph and subgraph into JSON files.
* Added the initial version of benchdnn graph driver which can be used to benchmark the performance with a dumped graph JSON file.

Breaking changes
* Removed operations `HardTanh`, `Index`, `Pow`, etc. Please check the operation kind list for details.

Known Issues and Limitations
* Graph Compiler component is not included with this release. It will be reinstated in oneDNN Graph Beta Update release.
* The weight’s opaque layout can be queried only from a compiled partition, which requires that input tensor shapes must be known at compilation time.
* Build option `ONEDNN_BUILD_GRAPH` is not compatible with some of the build options supported by the build system including `ONEDNN_GPU_RUNTIME=OCL`, `ONEDNN_ENABLE_WORKLOAD=INFERENCE`, `ONEDNN_ENABLE_PRIMITIVE`, and others.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

2.6.2

This is a patch release containing the following changes to v2.6.1:
* Removed unused variables (2500b0f6c1931f4b0b22b5fc92fcc87c6b875a3f, b4e00322c93984082b987408af8a2e341c7fd6c2)
* Fixed correctness issue in fp32 convolution implementation for cases with large spatial size (207af06637ccf36fb08c5fd93b55d52a578cfa5a)
* Fixed correctness issue in bfloat16 matmul implementation for processors with Intel AMX support (404b762f27350d5ad59225d966310b481951451e)
* Fixed correctness issue in int8 reorder implementation with zero points (b340cba1cadc8fc6424945b5b2a09960bd8d47ec)
* Improved int8 matmul and inner product primitives performance with small matrices for processors with Intel AMX support (73b75723921e9881b88b027a8f1b2d42251f6403, 58b386a21cfc9dbb7c331626e9e4752751cdf415)
* Improved int8 convolution performance for processors with Intel DL Boost support (f35a62f9b3c1db5ce8a2704e530e050b2f4b1807)
* Aligned AUGRU formula with Tensorflow definition (e47c6c570d97545b56f3afef77ce9fbd63ea320b, 4ba0a577947733690cdd0f9ecf269121148a28e1, b311e24ac3b669d6200b595201107601b6ce1f58)
* Suppressed 'unvectorized loop' warning for Intel C/C++ Compiler (3932d0493586963df3cefb3c8f35cb6503cd444e)

graph-v0.5.2
This is a patch release containing the following changes to [graph-v0.5.1](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5.1):

* Deprecated quantized ReLU fusion patterns (85405a94)

2.6.1

This is a patch release containing the following changes to v2.6:
* Extended depthwise convolution post-op with support for arbitrary filter size, stride, and padding (79b019b102c5d68843d52473f7d26a80597d84d2)
* Improved GEMM performance with threadpool threading on system with Intel AVX2 instruction set (2be0060dbf0291687bb8243068121d6cdda30ec2)
* Fixed runtime error in GPU reduction primitive for specific tensor sizes (efbf9b5e8c12666314f3484ce279cee0a1a91a44)
* Improved convolution performance on GPUs with Xe-HPG IP (f8de0c93e9ff53a7d0a41b97aabc85e828020881, c1fb8acd0f74f63db021d41dedcd54546aab5289)
* Updated ITT API to 3.22.5 (9b186765dded79066e0cd9c17eb70b680b76fb8e)
* Fixed correctness issues in reorder implementation for non-x64 systems (9961b8698b603842c79b492d82a05ba8dccb15da, 102063159c37b63c80fe6310e4d0481370a8ff02, 8b960dfaf43c417ed86b7da25451c12151c1a87b, ef1d9fa441f2e4e5c06a34042934cc272171a2e1, 8edd85907f42b72f9ace5dbc2bfcf43a63ce3d1b, 39edcf61e162d7f3a7449e05bfedccd1301fe34e, 3e0a0d9dbff6dd1c5e5d94f3c29727d489af7917, 1dff6251dd262c3bf1c5ec36a24ad9c2c46f2624, 8661958a4f4fce5c3f1dd65f30b03d9742579179)
* Fixed handling on `inf` and `-inf` values in eltwise log algorithm (732cbdd2651bc8ea4c7ae125c29e542fecd79b8e, 3fd0f2e44c84869181aa2506e8924c37e9267b64)
* Improved depthwise convolution performance on GPUs with Xe-HPG IP (7a6fe1d964d423a22d9e3525f7851a7d221460ad)
* Addressed fails in `test_isa_hints` gtest on GPUs (78c1c68305f81cb087f3e4dc2cebb07cace1ef4d)
* Fixed issues with bfloat16 GEMM producing NaNs in certain cases on GPUs with Xe-HPC IP (5d659707f0cd9bc432e5f74d6e9d8b3bbc4776ad)
* Changed default layout to blocked for depthwise convolutions to avoid spurious reorders (78f231b03f4a1126991f4e725b75c090925fd870)
* Addressed issue with incorrect values in padded areas for convolution with post-ops on GPUs (2e4ad3ab7182cbc666af3a5c32d59bbd7cf710b7)
* Fixed build issues with `-Werror=odr` option (27668dd728a3a3460315e44275490daab317fa8d)
* Addressed issues detected by clang USAN in BRGEMM kernel (2bbaa3092b27dc0bf08dc2c534e3ee761d6fb6e0, 9b3826f762de28b2c35aa8f9249b916973b7b140, b59b02716367e64e35264093828da1c0b3edc646)

graph-v0.5.1
This is a patch release containing the following changes to [graph-v0.5](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.5):

* Fixed the layout propagation of Reshape and Transpose operators in oneDNN backend (3b681d4, 09863f9)
* Enabled scalar Divide + MatMul fusion in oneDNN backend (d4c7dc6)
* Enabled Convolution + LeakyReLU fusion in oneDNN backend (b0f4dbb, c8fb4c13, e15979e)
* Improved the document of fusion patterns (b9a52384)
* Fixed operands swapping for binary operators (a07bfdac, d2567d7)
* Worked around a false positive build issue in GCC11 for compiler backend (17a40d0)

graph-v0.4.3
This is a patch release containing the following changes to [graph-v0.4.2](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.2):

* Upgraded to oneDNN [v2.5.4](https://github.com/oneapi-src/oneDNN/releases/tag/v2.5.4) patch release (3418ec1)
* Fixed compiler backend to build with downstream projects when LLVM is used (c73dd858)
* Fixed the layout propagation of Reshape and Transpose operators in oneDNN backend (cbdb736f)

graph-v0.5
This is the Alpha release for oneDNN Graph API based on [oneDNN v2.6](https://github.com/oneapi-src/oneDNN/releases/tag/v2.6) release.

Functionality

* Introduced FP32 and BF16 training support on CPU.

* Introduced multiple layer perceptron (MLP) fusion supported by oneDNN Graph compiler with optimized code generation (experimental).

* Updated API to comply with [oneDNN Graph API specification v1.0-alpha](https://spec.oneapi.io/onednn-graph/latest/index.html).

Known Issues and Limitations

* The weight’s opaque layout can be queried only from a compiled partition, which requires that input tensor shapes must be known at compilation time.

* MHA and MLP fusion are not activated on machines without AVX-512 support, as oneDNN Graph compiler generates AVX-512 and newer instructions.

Thanks to the Contributors

This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

2.6

Performance Optimizations
* Intel Architecture Processors
* Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
* Improved performance of matmul primitive for processors with Intel AVX-512 support.
* Intel Graphics Products
* Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
* Improved performance for future Intel Arc graphics (code name Alchemist and DG2).
* AArch64-based Processors
* Improved binary primitive performance with Arm Compute Library (ACL).
* Improved shuffle primitive performance for processors with SVE 512 support.

Functionality
* Introduced bfloat16 destination support for int8 convolution, matmul and inner product primitives for processors with Intel AVX-512 support and or future Intel Xeon® Scalable processors (code name Sapphire Rapids)
* Extended RNN primitive with support for [AUGRU cell](https://oneapi-src.github.io/oneDNN/dev_guide_rnn.html#augru).
* Added support for non-zero negative slope in ReLU post-op for batch normalization primitive.
* Introduced support for mixed source and destination data types in softmax primitive.
* Introduced [persistent cache API](https://oneapi-src.github.io/oneDNN/dev_guide_persistent_cache.html). This functionality allows to serialize and reuse JIT kernels.

Usability
* Added build time options to manage the set of supported instruction set architectures on Intel Graphics Products. See [`ONEDNN_ENABLE_PRIMITIVE_GPU_ISA`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive-gpu-isa) for more details. This feature further reduces the binary footprint.
* Extended built time options [`ONEDNN_ENABLE_PRIMITIVE`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive) and [`ONEDNN_ENABLE_WORKLOAD`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-workload) to GPU implementations. This feature further reduces the binary footprint.
* Reduced stack consumption in GEMM implementation.
* Added command line help to benchdnn.

Deprecated Functionality
* Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Breaking Changes
* Removed performance optimizations for Intel Xeon Phi processors. oneDNN will continue to be functional on these processors using Intel AVX2 codepath.

Thanks to the Contributors
This release contains contributions from the project core team as well as Arthur Mitrano aaraujom, Aslan aslanxie, Attila T. Áfra atafra, Damian Szwichtenberg dszwicht, Diana Bite diaena, Joel Dippold jedippold, Jonathan Deakin jondea, Jonathan Louis Kaplan JLouisKaplan-Arm, Kentaro Kawakami kawakami-k, Luke Ireland LukeIreland1, Mesut Meterelliyoz mmeterel, Nathan John Sircombe nSircombe, Peter Caday petercad, Tengfei Han Tengfei09, and Thiago Macieira thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

2.6rc

This is a release candidate for oneDNN v2.6. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).

Performance Optimizations
* Intel Architecture Processors
* Improved performance for future Intel Xeon® Scalable processors (code name Sapphire Rapids). The functionality requires Linux kernel 5.16 or later.
* Improved performance of matmul primitive for processors with Intel AVX-512 support.
* Intel Graphics Products
* Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
* Improved performance for future Intel Arc graphics (code name Alchemist and DG2).
* AArch64-based Processors
* Improved binary primitive performance with Arm Compute Library (ACL).
* Improved shuffle primitive performance for processors with SVE 512 support.

Functionality
* Extended RNN primitive with support for AUGRU cell.
* Introduced support for mixed source and destination data types in softmax primitive.
* Introduced [persistent cache API](https://oneapi-src.github.io/oneDNN/dev_guide_persistent_cache.html). This functionality allows to serialize and reuse JIT kernels.

Usability

* Added build time options to manage the set of supported instruction set architectures on Intel Graphics Products. See [`ONEDNN_ENABLE_PRIMITIVE_GPU_ISA`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive-gpu-isa) for more details. This feature further reduces the binary footprint.
* Extended built time options [`ONEDNN_ENABLE_PRIMITIVE`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-primitive) and [`ONEDNN_ENABLE_WORKLOAD`](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html#onednn-enable-workload) to GPU implementations. This feature further reduces the binary footprint.
* Reduced stack consumption in GEMM implementation.
* Added command line help to benchdnn.

Deprecated Functionality
* Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in future releases.

Breaking Changes
* Removed performance optimizations for Intel Xeon Phi processors. oneDNN will continue to be functional on these processors using Intel AVX2 codepath.

Thanks to the Contributors
This release contains contributions from the project core team as well as Arthur Mitrano aaraujom, Aslan aslanxie, Attila T. Áfra atafra, Damian Szwichtenberg dszwicht, Diana Bite diaena, Joel Dippold jedippold, Jonathan Deakin jondea, Jonathan Louis Kaplan JLouisKaplan-Arm, Kentaro Kawakami kawakami-k, Luke Ireland LukeIreland1, Mesut Meterelliyoz mmeterel, Nathan John Sircombe nSircombe, Peter Caday petercad, Tengfei Han Tengfei09, and Thiago Macieira thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

graph-v0.4.2
This is a patch release containing the following changes to [graph-v0.4.1](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1):
* Fixed compiled partition cache by checking CPU threading number (68f262a, 343246e)
* Enabled binary add and multiply patterns (71a0cfef)
* Fixed the MHA (multi-head attention) patterns in compiler backend and benchdnn graph (45bbcb3, caaf841)
* Fixed the build issues for semi-compiler backend (62dd2ca, 738276a, 347f1a9, 2123326)

Page 8 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 8 of 27

2.7rc

2.6.3

2.6.2

2.6.1

2.6

2.6rc

Page 8 of 27

Links

Releases