Onednn

Latest version: v2025.1.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 27

3.1.1

This is a patch release containing the following changes to v3.1:
* Fixed correctness issue in pooling primitive with post-ops on Intel GPUs (4b7bc1a7bf16909003f63bf66d3d730cee00e5db)
* Fixed segfault in `bfloat16` convolution on processors with Intel AMX support (461d55e65f2bc0f45fcdfc3405493226218d22ee)
* Fixed correctness issue in deconvolution primitive with post-ops on Intel GPUs based on Xe-LP architecture (c8943f588e99f6251a443ee4eb5c274e9c942947, ad3c62f104b07d30cc0f5cf34ca7bf127041e4dc)
* Fixed performance regression in `int8` convolution primitive with scales (7fa3b6f335893270cdd079f4f8aadd36cf8f490b, bb3ecc460605eae3ca8a8ee79a8d9122f195730b)
* Fixed correctness issue in `int8` convolution primitive with zero points on processors with Intel AVX2 and Intel DL Boost support (d721767a554f9a4da70bd6bc1c27c00b1ea80cc2, f6365b1b2c6e6d79e59207dad090b9643224f147)
* Fixed performance regression in `int8` inner product on processors with Intel AVX-512 and Intel DL Boost or Intel AMX support (2ede31e834a25ca14c648e8617b972148c94554c)
* Fixed segfault in pooling primitive with post-ops on processors with Intel SSE4.1 support (d712173a5b9df2bdefd12cc94be2e83e64cfb433, e4085a706dd0b41c3d8171193b816a3c4e52c01d)
* Fixed integer overflow in eltwise primitive on Intel GPUs (1932b3d04e574745d54802ee19e18bcbe8887e2d, be05c3392eaf86f2d897c5ec42a8860361c290b8, 148006b86f66e4af8f3ebd7db94980de487b9287, 2e643692480be21019b2b71db69e07729bfbf26c, b4423fbc11e574697d97eda18d4b0d8d7b1f60f3, 87fd48f48847463cbd1c42a39c9aa092158dbf2f, 9a66ac6f394071b05285b063a393acd297e3c662, 6ce52eb340486373670a9975c54449cf14a73d4f, 36bf079e7e99e0408ec11fe94cd64439f30b4014, 161d2b6416f4e9c17eabd1d45b8a3aeb2d4e9dd0, a5ef0788afcb719d22a311f91b31f3afca392a7c, d058bd8898b92330546d3f8d52335631fda5051a)
* Fixed primitive creation error in large 3D convolutions on Intel GPUs (7c23d9e85ef328081f7d9836ebfffda755f4b496)
* Fixed performance regression in `fp32` convolution primitive weight gradient on Intel GPUs (ff209f967c2bdfa1139779cf59dced374e2064c5, 87108392da71b06594356a18232ac1378e28adfc)
* Fixed primitive creation error in `int8` convolution with zero points on Intel GPUs (cb9169397ceee206fece71f73b5d627ee9eea33f, 85e58af6b5cb1a9cd42cd602832c035a3b3a660f)
* Fixed correctness issue in `fp32` convolution with Winograd algorithm on Intel GPUs (97ac88509bf8799fd03eb768faec302d44ce38dc)
* Fixed primitive creation error in depthwise convolution on Intel GPUs based on Xe-LP architecture (51d608d24f09d6b0ad2d60008f09646dbf79ee60)
* Fixed segfault during Graph partition compilation (a5d35682307ec81107f603b66c5f4ca95f421fbb)
* Fixed crashes in inner product with unsupported weight formats on Intel64 CPUs (c0f4e93903f1c32bef8378d58177ef971c400e90)
* Fixed an issue with compilation of Graph partitions containing matmul and using destination tensor layout `any` on Intel GPUs (ab2041d39862de747535037eb5a73c675d93d323, f2c457d72896d6c86245a6c6e80539b842aec369)
* Improved accuracy of eltwise primitive with `gelu_erf` algorithm on Intel64 CPUs (e67abefadbb4fd73ea6a4d3981965bc56eb77b97)
* Fixed correctness issue in `int8` matmul and inner product primitives on Intel GPUs based on Xe-HPG and Xe-HPC architecture (36aa6224ebae1413a6badd43ffc96d3412c8f8ec)
* Fixed potential correctness issue in `bfloat16` convolution weight gradient on processors with Intel AMX support (c93e673bba299fdc62733f22d65d91f4dbc300dd, 8da108375bc02b08a385b167a49aa8d1189b66d6, f7acf9877b368a5f704dcc9efcb913345b477bbc)
* Fixed memory corruption in inner product weight gradient on processors with Intel AMX support (b56a89e1b977d793f2de89dc95bb7f07f2449cd8)
* Fixed integer overflow issue in convolution primitive on Intel GPUs (774deabcbb9dc3e452bdafcde5e92a55c3701309, 663c2e44272c57a97e5f20e3a7a28cb9ac91ae01, 12d57430c66eb4d83532a2338443faae7be8ea5c, 31ac0e045981b03434c7592fe84af97a79a3d4a8, e3cb07d60473c23829db987384e5366b924e22c4)
* Fixed correctness issue in matmul primitive with broadcasted bias on Intel GPUs (3ba7e8b9c14948da35c86d4d74725f0d23511fc8)
* Fixed correctness issue in inner product primitive with post-ops on processors with Intel AVX2 support (69260f661030f66b34fefeab97044c81769462a9)
* Fixed out of bounds prefetching in matmul and inner product primitives on Intel GPUs (2b8f6b16dd894f7c13c33a9fd5c497cff10d66c2)
* Fixed dispatching issues for `fp32` inner product implementation on processors with Intel AVX2 and Intel DL Boost supprt (f27dedbfc093f51032a4580198bb80579440dc15, f8d7c2e40a965fc52521d4ba9c793d8adc2be4e1)
* Fixed division by zero issue in eltwise and eltwise post-op on Intel GPUs (f5654f55582f003c22aee23e5a91acfead8d1e1b, a18c19e654483b547bbe791d0640eceef4ef2e79, a7c8cbc428ad361e2f290605be1280268eb8ea56, 44355a60e31fd20bf6fa029af5bf3eebc533ec2c)
* Fixed correctness issue for 3D convolution primitive with post-ops (e6b93af5bdb32691ad90d3f537158649b61a6fc4)

3.1

Performance Optimizations
* Intel Architecture Processors:
* Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
* Introduced initial optimizations for future Intel Xeon Scalable processor (code name Sierra Forest). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).

* Intel Graphics Products:
* Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
* Improved concat primitive performance with per-argument scales on Intel GPUs.

* AArch64-based Processors:
* Improved layer normalization primitive performance with Compute Library for the Arm Architecture (ACL).

* AMD GPUs:
* Introduced optimized matmul implementation.

* RISC-V-based Processors:
* Improved pooling primitive performance for processors with RISC-V vector extension (RVV) support.

Functionality
* Enabled [Graph API](https://oneapi-src.github.io/oneDNN/graph_extension.html) as a production feature. Graph API is intended to simplify oneDNN integration into frameworks.
* Added an option to zero-out weight gradient in RNN primitive. See details in corresponding [RFC](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20221229-rnn-bwd-w-accumulation).
* **[experimental]** Added support for [sparse memory](https://oneapi-src.github.io/oneDNN/dev_guide_experimental.html#onednn-experimental-sparse) and dense by sparse matrix-matrix multiplication support in the matmul primitive. The functionality is supported on processors with Intel AVX2 and Intel AVX-512 instruction support.
* Introduced out-of-order queues support for OpenCL runtime. See the [OpenCL Interoperability](https://oneapi-src.github.io/oneDNN/dev_guide_opencl_interoperability.html) section in the Developer Guide for more details.
* Added support for the non-zero alpha parameter in the batch normalization ReLU post-op on Intel GPUs.
* Enabled the layer normalization primitive with f64 datatype support on Intel GPUs.
* Added support of per-argument scales in matmul, convolution, inner product, and reorder primitives on NVIDIA GPUs.

Validation
* Extended benchdnn with functional and performance validation for [Graph API](https://oneapi-src.github.io/oneDNN/graph_extension.html).

Breaking Changes
* Builds with OpenCL runtime will fail unless Graph API is disabled with `ONEDNN_BUILD_GRAPH=OFF`.

Known Issues and Limitations
* Graph API constant cache feature is disabled with SYCL CPU runtime due to an issue with the oneAPI DPC++ Compiler runtime. This will result in lower performance for some scenarios.

Thanks to the Contributors
This release contains contributions from the project core team as well as Amy Wignall AmyWignall-arm, Annop Wongwathanarat annop-w, arlesniak, bdmoore1, Crefeda Rodrigues cfRod, David Svantesson davsva01, Fadi Arafeh fadara01, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, Pavel Zamelin pazamelin, Pawel Piotrowicz pawelpiotrowicz, Peter Caday petercad, ranzhejiang, and Sanchit Grover sanchit-grover-intel. We would also like to thank everyone who asked questions and reported issues.

graph-v0.9
This is the Beta Update 3 release of oneDNN Graph API based on oneDNN [v3.0.1](https://github.com/oneapi-src/oneDNN/releases/tag/v3.0.1).

Performance Optimizations
* Improved multi-level perceptron (MLP) and residual block subgraphs performance with oneDNN Graph Compiler backend on 4th generation Intel Xeon Scalable processors (formerly Sapphire Rapids).
* Improved dynamic shape performance for MLP and multi-head attention (MHA) patterns with oneDNN Graph Compiler backend.
* Improved performance of oneDNN Graph Compiler built-in code generator.

Functionality
* Extended the set of multi-head attention (MHA) variants supported by oneDNN Graph Compiler.

Known Issues and Limitations
* The weight’s opaque layout can be queried only from a compiled partition.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

3.1rc

This is a release candidate for oneDNN v3.1. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).

Performance Optimizations
* Intel Architecture Processors:
* Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
* Introduced initial optimizations for future Intel Xeon Scalable processor (code name Sierra Forest). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).

* Intel Graphics Products:
* Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
* Improved concat primitive performance with per-argument scales on Intel GPUs.

* AArch64-based Processors:
* Improved layer normalization primitive performance with Compute Library for the Arm Architecture (ACL).

* AMD GPUs:
* Introduced optimized matmul implementation.

* RISC-V-based Processors:
* Improved pooling primitive performance for processors with RISC-V vector extension (RVV) support.

Functionality
* Enabled [Graph API](https://oneapi-src.github.io/oneDNN/graph_extension.html) as a production feature. Graph API is intended to simplify oneDNN integration into frameworks.
* Added an option to zero-out weight gradient in RNN primitive. See details in corresponding [RFC](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20221229-rnn-bwd-w-accumulation).
* [experimental] Added support for [sparse memory](https://oneapi-src.github.io/oneDNN/dev_guide_experimental.html#onednn-experimental-sparse) and dense by sparse matrix-matrix multiplication support in the matmul primitive. The functionality is supported on processors with Intel AVX2 and Intel AVX-512 instruction support.
* Introduced out-of-order queues support for OpenCL runtime. See the [OpenCL Interoperability](https://oneapi-src.github.io/oneDNN/dev_guide_opencl_interoperability.html) section in the Developer Guide for more details.
* Added support for the non-zero alpha parameter in the batch normalization ReLU post-op on Intel GPUs.
* Enabled the layer normalization primitive with f64 datatype support on Intel GPUs.
* Added support of per-argument scales in matmul, convolution, inner product, and reorder primitives on NVIDIA GPUs.

Validation
* Extended benchdnn with functional and performance validation for [Graph API](https://oneapi-src.github.io/oneDNN/graph_extension.html).

Breaking Changes
* Builds with OpenCL runtime will fail unless Graph API is disabled with `ONEDNN_BUILD_GRAPH=OFF`.

Known Issues and Limitations
* Graph API constant cache feature is disabled with SYCL CPU runtime due to an issue with the oneAPI DPC++ Compiler runtime. This will result in lower performance for some scenarios.

Thanks to the Contributors
This release contains contributions from the project core team as well as Amy Wignall AmyWignall-arm, Annop Wongwathanarat annop-w, arlesniak, bdmoore1, Crefeda Rodrigues cfRod, David Svantesson davsva01, Fadi Arafeh fadara01, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, Pavel Zamelin pazamelin, Pawel Piotrowicz pawelpiotrowicz, Peter Caday petercad, ranzhejiang, and Sanchit Grover sanchit-grover-intel. We would also like to thank everyone who asked questions and reported issues.

3.0.1

This is a patch release containing the following changes to v3.0:
* Fixed potential correctness issue in convolution weight gradient with 1x1 filter and strides (e58996692802f4a94651f6baa6e3f0debf93b537)
* Improved convolution, deconvolution, inner product, and matmul primitives performance with scales on Intel CPUs (38319f1f822387bd755183bcac2ec3d0745a88b4, 18de927dc205543701942f0f26d61f72c51f5f0b, b6170d1b79332d8ba0f72227cb5edd2aced837c0, 85171b0cc057d5ba682dee582cd72c48543389db)
* Reverted MEMFD allocator in Xbyak to avoid fails in high load scenarios (eaaa41b8a30101640094e46af7f27969ed105ee2)
* Fixed array out of bounds issue in `bfloat16` convolution weight gradient on Intel CPUs (a17a64c330d1153fdea3d81f1420fb38c50248bd)
* Improved compatibility with future versions of Intel GPU driver (eb7a0a07df12874a40c0f135d8bf16116594e0e8)
* Fixed segfault in `fp16` and `bfloat16` convolution backward propagation on systems with Intel AMX support (293561b6a2644ef05d8d664cd81c1bcde876b481)
* Fixed build issue with GCC 13 (1d7971ce488da657e23f08488cdb6ef8e484c5e8)
* Fixed correctness issue in `int8` RNN primitive Vanilla GRU flavor on Intel CPUs (f4a149c16faff0fb51fb292d12a7b51f6fac53bf, fbf8dca1ba9b565ddedd1cb291d3b466d0a5a45b)
* Added check for unsupported arguments in binary primitive implementation for AArch64-based processors (5bb907077cd7b4c3983f7215d5509b17f3da67e2)
* Fixed correctness issue in `int8` convolution with zero-points on Intel Data Center GPU Max Series (96e868c473bb0e2a9b1a42b51e8f91997b52b471)
* Fixed runtime error in convolution primitive with small number of channels on Xe-based graphics (068893e1c792c8e9ad5b17bc6e494359b32f910f)
* Removed use of OpenCL C variable length arrays in reduction primitive implementation for Intel GPUs (41e8612f212d939643932ef309cd78bd4194f42d)
* Fixed correctness issue in matmul and inner product primitives on Intel Data Center GPU Max Series (a1e6bc57b233d85a6f382db611879614236d9b05, dbb7c284e0834cd0fe84c8311484880802fa9af0)
* Fixed segfault in `fp16` and `bfloat16` convolution backward propagation on future Intel Xeon processors (code name Sierra Forest) (399b7c5af4c5238f9956d71270adbd44f3cb25a3)
* Fixed runtime error in Graph API for partitions with quantized matmul and add operations (f881da5be31abc71f90a1a750c50ec2ea5dbc516, 699ba755fde86aea3714bbce75d5b0b274302545, b8d21a58d8247097ed26816b730e3cd4c19f61c, 9421fb2a453aee957a0c1dc10be5675e5f916c2e)
* Fixed convolution performance regression on Xe-based graphics (1869bf26a92f8d8f36853e537f9727412a4d1f94)
* Improved convolution performance with `OHWI` and `OIHW` weight formats on Intel Data Center GPU Max Series (2d0b31ee82dc681b829f67100c05ae4e689633e6, 5bd5d52e7ee832fb0d5ece6d42a6b230023c9dd0)
* Fixed include files handling in build system affecting CMake projects relying on oneDNN (c61645392fde55ac361c95a752df0cfa7ef24345)
* Added `tbb::finalize` to tests and examples to address intermittent test crashes with TBB runtime (891a41560382cc0f991c428392078d13ccb76129, c79e54322f251aa70783ca1b837ce0d558bf3396, 8312c3addc597e6565cf1233801234c2ffafd092, 1a32b95a2c61d094206ed49d69843fdcdeb2ffcd, bd0389d81509baf6696d3927d0da4cce4c06d2d4, f05013d0e419df22ec2755dc5d74f5974871cf9e, ab7938f1b889aa43f155216f774297e8c765cd97, 31c9e7b3c1a7e262cecafe98bed128843f1c2969, f3261e4556935424946697be4b336020653b41a5, d58ac41a12179f8cca48962c4b5a44940bea97d7, f8c67b9026dc2945ed66a8f1c276611c063dae4d, 258849b71c24a89b08ac12972ec1fcaa72a9da39, b20a8c786c5a2cb676a2a8b599edf5cfd7ee0c3a)
* Fixed segfault in `fp16` convolution primitive on future Intel Xeon processors (code name Granite Rapids) (a574ffff870318cc104d8af4a2368d47b433b27f)
* Fixed correctness issue in `fp16` convolution primitive on future Intel Xeon processors (code name Sierra Forest) (f165ed8a8872e72a7d9651c3dd38bd6c2909fdce)
* Fixed correctness issue in `int8` convolution primitive on Intel CPUs (ca1592237b87cae5e4a55fb464ad90fb9f91957d, 27845b8e66d354549ac6c6fceeb92c267a9e910f)
* Fixed correctness issue in `int8` convolution primitive on Intel Data Center GPU Max Series (8bb651cb99e2875aea44b907bdc54418b2d4932a)
* Fixed correctness issue in resampling primitive with post-ops on Intel CPUs (aa52a5128d44c6d745b89beabcd47f428665843e)
* Addressed excessive memory consumption in 3D convolution on Intel CPUs (3d6412af5cb99863ede8753238533dcabcd3c5d9, 097acb5e108eb57b38a8a2409b083a1819b9f962, fd696639c70c4cd92e2aaf871bc4165c269d29f7)
* Fixed segfault in convolution with `sum` and `relu` post-ops on Intel CPUs (63ad769939dd8307935caac67c0fc7c9bc9206de, 1b1303748b80360e5f93740d6ea03063132fd8f8, 0a8116b3de98243a234680d8cda869d2f20dd178, 9972cb80a29da9f14efbe8518bc10a21f7ae6e36)
* Addressed convolution performance regression with small number of channels on Intel GPUs (d3af87710fcae9561ae22017d45bd670f8858272)
* Worked around MSVS 2019 bug resulting in build fails on Windows (40247753290e3e886b9235c5f80a2997eb85372a)
* Updated code base formatting to clang-format 11 (23576f935fcef245b26cc78ef74935ea6bb7e6b7, 0b1bf845e05da75e4d994e01a0d7996b64787ece)

graph-v0.8.1
This is a patch release containing the following changes to [graph-v0.8](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.8):

* Upgraded oneDNN dependency from v2.7.2 to v2.7.3 (93237aa, 260bdb5)
* Fixed a correctness issue of quantized Convolution + Add fusion (26a9a5b, beba352)
* Fixed `query_dynamic_outputs()` interface implementation in graph compiler backend (8dbca04)

3.0

Performance Optimizations
* Intel Architecture Processors:
* Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
* Introduced FP16 support and initial optimizations for future Intel Xeon Scalable processor (code name Granite Rapids). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).
* Intel Graphics Products:
* Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
* AArch64-based Processors:
* Improved reorder performance for processors with Scalable Vector Extensions (SVE) support.
* Improved pooling performance with post-ops for processors with SVE 512 support.
* Improved batch normalization performance with non-default flags for processors with SVE 512 support.
* Improved performance of FP16 functionality with Compute Library for Arm Architecture (ACL).
* Improved deconvolution performance with ACL.
* PowerPC64-based Processors:
* Improved int8 GEMM performance.
Functionality
* Introduced [new quantization scheme](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20220201-quantization-scaling). Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
* [experimental] Introduced [Graph API support](https://oneapi-src.github.io/oneDNN/graph_extension.html) that simplifies oneDNN integration into applications. The functionality is disabled by default and can be enabled at build time with `ONEDNN_BUILD_GRAPH=ON` flag.
* Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
* Extended [persistent cache](https://oneapi-src.github.io/oneDNN/dev_guide_persistent_cache.html) to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
* Extended [threadpool API](https://oneapi-src.github.io/oneDNN/dev_guide_threadpool.html) with a function to indicate maximum available concurrency.
* Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
* Introduced pooling and reduction primitives support on AMD GPUs.
* Introduced reduction primitive support on NVIDIA GPUs.

Usability
* Extended the set of supported format tags to cover formats used in applications.

Validation
* Extended the GoogleTest (gtest) suite with support for Parametric Rectified Linear Unit (PReLU) primitive.

Breaking Changes
* Removed [deprecated APIs](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20220815-v3.0-API-cleanup).
* Removed operation descriptor object and made memory descriptor object opaque. See details in [operation and memory descriptors RFC](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20220608-make-opdesc-and-md-opaque).
* Removed creation time primitive scales support and primitive output scales support. See details in [quantization scaling RFC](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20220201-quantization-scaling).
* Removed support for Intel DPC++/C++ Compiler 2022 and SYCL 1.2.1 (aka SYCL 2017) standard support. Use Intel DPC++/C++ Compiler and SYCL 2020 standard instead.
* Removed Winograd convolution implementation for int8 data type.
* Updated minimal supported ACL version to 22.08 (was 22.05).

Thanks to the Contributors
This release contains contributions from the project core team as well as akshatasangelkar, Aryan Karumuri AryanKarumuri, Crefeda Rodrigues cfRod, Divakar Mariyanna bmdivakar, Gordon Fossum austinpagan, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, lilianhuang lilh9598, Milos Puzovic milpuz01, Mona Minakshi monaminakshi, Nathan John Sircombe nSircombe, Peter Caday petercad, and Sreekanth Yalachigere sreekanth-yalachigere. We would also like to thank everyone who asked questions and reported issues.

graph-v0.8
This is the Beta Update 2 release of oneDNN Graph API based on [oneDNN v2.7.2](https://github.com/oneapi-src/oneDNN/releases/tag/v2.7.2).

Functionality
* Added `HardSigmoid` operation.
* Added block tensor layout support to improve performance on Xe architecture-based GPUs.
* Added support of `IOX` and `XOI` weight formats for `ConvTranspose` operation.
* Added `query_dynamic_outputs` API to support dynamic shapes in the graph. This functionality allows Graph API to infer output tensors shapes based on input tensors.
* **Experimental**: Introduced dynamic shapes support for MHA via oneDNN Graph Compiler.

Known Issues and Limitations
* The weight’s opaque layout can be queried only from a compiled partition, which requires that input tensor shapes must be known at compilation time.
* MHA and MLP fusion are not activated on machines without Intel AVX-512 support.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Chunyuan Wu, Sanchit Jain, Yiqiang Li, Yunfei Mao, Kiefer Kuah and others.

3.0rc

This is a release candidate for oneDNN v3.0. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).

Performance Optimizations
* Intel Architecture Processors:
* Improved performance for 4th generation Intel Xeon Scalable processor (formerly Sapphire Rapids).
* Introduced FP16 support and initial optimizations for future Intel Xeon Scalable processor (code name Granite Rapids).
* Intel Graphics Products:
* Improved performance for Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M).
* AArch64-based Processors:
* Improved reorder performance for processors with Scalable Vector Extensions (SVE) support.
* Improved pooling performance with post-ops for processors with SVE 512 support.
* Improved batch normalization performance with non-default flags for processors with SVE 512 support.
* Improved performance of FP16 functionality with Compute Library for Arm Architecture (ACL).
* Improved deconvolution performance with ACL.
* PowerPC64-based Processors:
* Improved int8 GEMM performance.
Functionality
* Introduced [new quantization scheme](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20220201-quantization-scaling). Major changes include support for per-argument runtime scales in all primitives and unquantized bias.
* [experimental] Introduced [Graph API support](https://oneapi-src.github.io/oneDNN/graph_extension.html) that simplifies oneDNN integration into applications. The functionality is disabled by default and can be enabled at build time with `ONEDNN_BUILD_GRAPH=ON` flag.
* Introduced support for Intel DPC++/C++ Compiler 2023.0, including new features from the SYCL 2020 standard.
* Extended [persistent cache](https://oneapi-src.github.io/oneDNN/dev_guide_persistent_cache.html) to cover GPU engine object. This improvement allows applications to further reduce oneDNN initialization time.
* Extended [threadpool API](https://oneapi-src.github.io/oneDNN/dev_guide_threadpool.html) with a function to indicate maximum available concurrency.
* Extended binary primitive implementation on GPU with bfloat16 source and int8 destination support.
* Introduced pooling and reduction primitives support on AMD GPUs.
* Introduced reduction primitive support on NVIDIA GPUs.

Usability
* Extended the set of supported format tags to cover formats used in applications.

Validation
* Extended the GoogleTest (gtest) suite with support for Parametric Rectified Linear Unit (PReLU) primitive.

Breaking Changes
* Removed [deprecated APIs](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20220815-v3.0-API-cleanup).
* Removed operation descriptor object and made memory descriptor object opaque. See details in [operation and memory descriptors RFC](https://github.com/oneapi-src/oneDNN/tree/rfcs/rfcs/20220608-make-opdesc-and-md-opaque).
* Removed creation time primitive scales support and primitive output scales support. See details in [quantization scaling RFC](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20220201-quantization-scaling).
* Removed support for Intel DPC++/C++ Compiler with SYCL 1.2.1 (aka SYCL 2017) standard.
* Removed Winograd convolution implementation for int8 data type.
* Updated minimal supported ACL version to 22.08 (was 22.05).

Thanks to the Contributors
This release contains contributions from the project core team as well as akshatasangelkar, Aryan Karumuri AryanKarumuri, Crefeda Rodrigues cfRod, Divakar Mariyanna bmdivakar, Gordon Fossum austinpagan, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, lilianhuang lilh9598, Milos Puzovic milpuz01, Mona Minakshi monaminakshi, Nathan John Sircombe nSircombe, Peter Caday petercad, and Sreekanth Yalachigere sreekanth-yalachigere. We would also like to thank everyone who asked questions and reported issues.

graph-v0.7.2
This is a patch release containing the following changes to [graph-v0.7.1](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.7.1):
* Upgraded oneDNN dependency to [v2.7.2](https://github.com/oneapi-src/oneDNN/releases/tag/v2.7.2) (dec9f8cc6)

Page 6 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 6 of 27

3.1.1

3.1

3.1rc

3.0.1

3.0

3.0rc

Page 6 of 27

Links

Releases