Onednn

Latest version: v2025.1.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 27

2.7.5

This is a patch release containing the following changes to v2.7.4:
* Fixed a correctness issue in fp32 batched matmul with transposed source tensor on processors with Intel AVX-512 instruction set support (1a9b80d7ad6856437c3e0b504bb53dca772eb0fe)
* Improved batched matmul performance on processors with Intel AMX instructions support (8c20f62dbcd4d622c8a279b7c81dacb629f1de41, acb8e12b0f3e70d2e80543e31a91362c8852bbaf)
* Fixed a correctness issue in int8 convolution primitive with zero points on processors with Intel AVX2 and Intel DL Boost support (0abbf225ce906987bc3728252b5842fb0239daab, d3a9f02e50334a0ebe102dd8bdb7887deeb12ec5)
* Improved convolution performance with small number of input channels on processors with Intel AVX-512 instruction set support (fc7fced9988124027220fb53dfb16022c9be35c0)

2.7.4

This is a patch release containing the following changes to v2.7.3:
* Fixed potential `NaN` issue in convolution weight gradient on Intel CPUs (6d80bb48714f8f8d030f055435f5bfde3a382f15, 4c34f89653259b2e15e277ff0663d6705f093e1b, 017950a16168640764d17558e41010d0ae038377, 796a600c3de2993b5d5819995ad13eb70d097496)
* Improved bfloat16 convolution weight gradient performance for processors with Intel AMX support (21bdc21f37ff835b9ce54d4b713d7bfd65060e30, 82cb7d37f861a471215b242e8df0330523cdf223, b2e948f931367c81a6887d4e0e544a9f50dcd673, 0a33f70c1b283d18631d299d3c907743d215e80d, ff05d0e8c2db056b0857bcbed22c5097f76529da)
* Fixed out of bounds writes in bfloat16 inner product weight gradient for processors with Intel AMX support (caead724fc6d309c7706760a520908e28b8f8b0b)
* Fixed illegal instruction in matmul for processors with Intel AMX support (be942a240e775dfda47bfff5622106851df218e5, 28ddb5bc91b01e266575047a676569c4af35a5eb, d264ba494a9f6b15d3eb21ec26e4606dd8d458c8)
* Fixed segfault in convolution with depthwise post-op for processors with Intel SSE4.1 support (f7081009737b836f23ef8adce70994815acfa842)
* Worked around segfaults for builds with Intel C/C++ Compiler 2021 for macOS (1382605c20bcdac9aa17c62cc38924138bc57db1)
* Fixed segfault in bfloat16 convolution with strides for processors with Intel AMX support (c3b1dcd2605cae5609d7175fcf5223da16e03fb9)
* Fixed correctness issue in int8 convolution with zero points for processors with Intel AMX support (5e76d8b07a431051b7d6a612c4933e36621fbc39)
* Fixed assertion fail in int8 convolution for processors with Intel AMX support (05629a5ccfae9250e6495ffc7d51152025fcfee1)
* Fixed incorrect results in vanilla GRU for Intel CPUs (2089770c4818be8933c5e9d1dd3cbaeba1457667)
* Improved bfloat16 convolution performance for cases with large number of channels and spatial dimensions (c67f46b0df29c3a7c6cbd0a9f1ebbc9adf4457e8, c9cb51d6bfb68aee8377e7781a5c4512f6aa4bea, 4e2c5730426422fc362c02a963b66072c083acaf, 474527f47acb1aeff2bf52efd64e09ac95d8ef5b, 87e8ea9d8e0499b19149c69748ef8503ad2fb75b)
* Fixed an issue with incorrect header files location when using oneDNN as subproject (be6abca883303e0cb4d2edac28da929a21d5d2a2)

2.7.3

This is a patch release containing the following changes to v2.7.2:
* Fixed segfault in int8 convolution with binary post-ops on Intel CPUs (c8d40c0719f9d9cffa1c5eb04f3f40fa1f9546b8)
* Applied workaround for tanh post-op on some Xe architecture based GPUs (3eb3267dc3bcfe64a081731ac9d08c84bc6827f7)
* Disabled fp16 post-ops with Compute Library for Arm Architecture (ACL) (f7b7dc0a8b3125602295047cdd7feb3cbb8d9a06)
* Fixed incorrect results for sequence of eltwise post-op with same algorithm but different parameters (02c26781171f6350634b41d80cbff7ae5092c1a1, 1c36e27520617e23b74ed32e675804ac7806576e, 81ba0fe626c93e51935d5e8776dd7e8bf4105487)
* Fixed issue in convolution with groups and plain activation layout on Intel GPUs (df6f2e34bfb1e3d6bcd5498a4febb149b2be8b2b, d0c14c204782945b3732bd83b7329c314c3339c1)
* Fixed reorder failures on Xe HPC architecture based GPUs (c3cb1d5fa7e2e41c7059fa7e5ebcee34aa3e5242)
* Fixed thread safety issue in convolution primitive (2955c9d5d5f97f03c4068af37f6783f0be256695)
* Fixed scratchpad allocation issue in matmul (989acd3b0dbd304fe47ac7837bb33e73a4ca7cd6)
* Disabled concat batching with scales on Intel GPUs since implementation doesn't support it yet (8aab73fe1897542c5ec740ac718b00e7d72edd92, 1eac450ca742cd9905addf36ee038a8e17e03474, 82838de623057ffd1dfc0f879afcd02e72f9538f)
* Fixed segfault and correctness issues in convolution primitive with sum and relu post-ops on Intel CPUs (fc335be0d1376f1dca527bd543f929739dffd55f, 0f4697a87c0f550339598c1918d5479801337426, 60f1727fcaf06416c5464b44c177ec16829bd2c1, d28f2c1757e2cc6b792e4fd5de40987e811d086d, 4761ee91b3729d124135273a7450d3d2cf0dce53, f674fbf917e92b2623184ad8c603f20ae4fe0ad7)

2.7.2

This is a patch release containing the following changes to v2.7.1:
* Fixed segfaults in deconvolution backpropagation with ACL on AArch64-based processors (f02e6f3f262813b8d0b6cb1f7b55fcc08b4b5bac)
* Fixed code generation issues in Intel AVX2 convolution implementation (2ba25236bc417c4d5fe1729ddf9e01f1d1d25fb3, b60633f79947199a1f0cfce7aa42b0ae14690401, 844326b853ba9ca9b7a34ec08ca6e2e28d7332e8, 2009164c2ae90e1e938ab8823c817a6c95fccc11)
* Fixed correcteness issues and runtime errors in deconvolution with binary post-ops on Intel GPUs (dd54d3906c9613a967b709907306b946cfe32cac)
* Improved performance of convolutions with small number of channels and large spatial sizes on systems with Intel AMX (26f97dc7a47aa2c0f0e13e6ff61dd3fc28fa077b, 4cb648d9e3620876fa7d7dca38a902643cd97dbc)
* Fixed runtime error in int8 convolutions with groups on Xe architecture based GPUs (e5a70f43639ba968869a99931d77116791ace355)
* Improved inner product weight gradient performance on Xe architecture based GPUs (9e9b859fddc6f813f9b9cac093d7d131c84054ab, 12ec4e3a51ddc105e86e9d29661690750560cd1c)
* Improved batch normalization performance with threadpool threading (4fd5ab2dd312b2b79e8f2f1b18b39a94fee39e84)
* Improved inner product performance with binary post-ops in broadcast mode on Intel CPUs (d43c70d4aafd58c241d456453994f4c7fe6aefff, 49ca4e17e7fd889c6c153f52dffa6f4d4a10e7c9)
* Fixed segfaults and correctness issues in sum primitive with threadpool threading (ee7a3219db8bcdb7870b65b6ee0aadfba2275513)
* Extended persistent cache API to cover engine objects (58481d606c19f4e46c1cd7dbfd4aba819ae024d3, 5f69dade29e317eab37455d477892996e80aea75, 16c0a95180a362c079fb2d3f01a4cea084b99628, 068071b326f253791ae767cae25258e6d47426ad)
* Added support for newer versions of Intel GPU drivers (71443935355ef4fc52b510be761c487de8677386)
* Updated ITT API version to 3.23.0 (d23cc9503f94ea9267bc8b6e654a912caa70e333)
* Fixed convolution correctness issue on Intel Data Center GPU Flex Series (365ac202ca2f58078549116a0650a91566a256b6)
* Fixed fp64 convolution correctness issue on Intel Data Center GPU MAX Series (9d4bf94d89b945cb703a7b4d04d539daf7aab8b5, 67054032e4b1b4eae11f006e3857fe20a0d7b16a)
* Fixed correctness issues in reduction primitive with binary post-op on Intel GPUs (ae9d075dbba068287b6cb280f0f22d3cdcbfcb36, e3b80c58f493e7972eb4d0317518534c1d8412e9)
* Improved convolution performance on on Intel Data Center GPU MAX Series (90be8d501f3b35e88f997bf9e0fd139a740f72f7, caf4863f40dd06b807d2bb1abb487aad21d586a6)
* Fixed build errors with ONEDNN_ENABLE_PRIMITIVE_GPU_ISA build option (de2db042bbb733de7c925224934ded766de74d68)
* Fixed correctness issues in convolution with per-tensor binary post-ops on Intel CPUs (9cf9c189f6f674bba38ea11217f4b06acab87194)
* Improved convolution performance on Intel Data Center GPU Flex Series (8b08a07574888bc265818a751eab82aa28115d72)

graph-v0.7.1
This is a patch release containing the following changes to [graph-v0.7](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.7):

* Fixed a build issue in compiler backend (70258d306)
* Optimized for zero points folding (d6f12b50c)
* Fixed a primitive descriptor cache issue in reorder fusion (08876524d)

2.7.1

This is a patch release containing the following changes to v2.7:
* Fixed performance regression for batch normalization primitive in TBB and threadpool configurations (cd953e4ca7390387b53fba7105f81a6fc1fc0382)
* Improved grouped convolution performance on Xe Architecture GPUs (d7a781e166ef3206d9b0ab79a69d76034d663c20, cb1f3fe27f466a26b484ed063546bd0b6c4cd306, 4e844740d6b26709c0aa3c2604ed52130560208a, 7ba3c40f65425c4bc2b922ae7b2cdd8cb8e5181c)
* Fixed runtime error in int8 reorder on Intel GPUs (53532a9944b2e4694d4c0135f0a1a5102ca97613)
* Reverted MEMFD allocator in Xbyak to avoid segfaults in high load scenarios (3e29ae26dba137a6232669bd1c5d42ad4449b794)
* Fixed a defect with incorrect caching of BRGEMM-based matmul primitive implementations with trivial dimensions (87cd9796a98497ab9a3ff5250ad3a396199590fb)
* Improved depthwise convolution performance with per-tensor binary post-ops for Intel CPUs (f430a5a4c883ef846f938f571020565d41719e9c)
* Extended threadpool API to [manage maximum concurrency](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20220929-threadpool_max_concurrency/README.md) (8a1e9595f131e1303887fba407a03dbd64ac301e, 64e559454787651186ed6a32e4eef2a17132b9b6)
* Fixed potential integer overflow in BRGEMM-based convolution implementation (25ccee38b97e935e6c3c729b9134804c6a2ea6a7)
* Fixed performance regression in concat primitive with any format on Intel CPUs (2a60adec0e73895caefb3dc7d1de74b5eac8c6da, feb614d5fef07fb2a188ceef15ebeaf9f9f45acf)
* Fixed compile-time warnings in `matmul_perf` example (b5faa77a4a651f1e44fa77348eded54ea3ec3eef)
* Fixed 'insufficient registers in requested bundle' runtime error in convolution primitive on Xe Architecture GPUs (4c9d46acc35126fec2b59125403566a90b6bed36)
* Addressed performance regression for certain convolution cases on Xe Architecture GPUs (f28b58aec55c5087127702f7c0a38d21b3006d35, 18764fbef1f1f90bc696fe35d059685b2b37f149)
* Added support for Intel DPC++/C++ Compiler 2023 (c3781c671dcc23c0fa16eb648c98ef33b79c737b, a1a8952656b2e84a4124cc0d2f8c7aae10e62a46, 9bc87e635dbeffd77808c70fbd51ac5dc834b582, e3b19871cab6c9b5c317cddb18f4264575868ed7)
* Fixed int8 matmul and inner product performance regression on Xe Architecture GPUs (3693fbf0e8b0cd3bcc2308a4504772c0af2eaf88, c8adc179133f7212523f4ecb1cdab648b0cec796)
* Fixed accuracy issue for convolution, inner product and matmul primitives with `tanh` post-op on Xe Architecture GPUs (88b4e57718014bd50f78461a5c80dc680074f9b6, 83ce6d27a8699d7ab0d1ee450e2e7e9ec87a6e13, 6224dc6b3e2073c98f4b8278bf7e87769dd85a55, 10f0d0ade797a90c93b7450c1e0b151dc415dab3)
* Suppressed spurious build warnings with GCC 11 (44255a8a57dc40ccc8f7b464e5638d6715216756)

2.7

Performance Optimizations
* Intel Architecture Processors
* Improved performance for future Intel Xeon Scalable processors (code name Sapphire Rapids).
* Introduced performance optimizations for [bf16 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on Intel Xeon Scalable processors (code name Sapphire Rapids). The bf16 math mode allows oneDNN to use bf16 arithmetic and Intel AMX instructions in computations on fp32 data.
* Intel Graphics Products
* Improved performance for future Xe Architecture graphics (code name Ponte Vecchio).
* Introduced performance optimizations for [tf32 floating point math mode](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) on future Xe Architecture graphics (code name Ponte Vecchio). The tf32 math mode allows oneDNN to use tf32 arithmetic in computations on fp32 data.
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and Intel Data Center GPU Flex Series (formerly Arctic Sound-M)
* AArch64-based Processors
* Improved convolution and binary primitive performance for processors with SVE 512 support.
* Improved shuffle and eltwise primitives performance for processors with SVE 256 and SVE 128 support.
* Improved PReLU, batch normalization, and pooling primitives performance via Compute Library for the Arm Architecture (ACL).
* Improved performance of inner product, matmul, convolution, and batch norm primitives with post-ops via ACL.
* PowerPC64-based Processors
* Introduced performance optimizations for int8 and bfloat16 GEMM.
Functionality
* Introduced runtime output scales support in all primitives.
* Introduced scales support in concat primitive.
* Extended [floating point math mode API](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html) with tf32 data type option.
* Extended eltwise primitive with support for `hardsigmoid` algorithm.
* Extended layer normalization primitive with support for mixed source and destination data types.
* Extended depthwise post-op with support for arbitrary padding size. The implementation is available only on Intel processors.
* Added limited fp64 data type support in convolution primitive. Optimized implementation is available for future Xe Architecture graphics (code name Ponte Vecchio).
* Extended int8 convolution and deconvolution implementations on GPUs with arbitrary destination data type support.
* Extended batch normalization primitive with `dnnl_fuse_norm_add_relu` flag that allows to fuse sum and relu operations. The implementation is available for Intel GPUs.
* Extended GPU deconvolution primitive implementation with support for output scales and zero points.
* Introduced threadpool threading support for AArch64-based processors.
* Introduced Unified Shared Memory (USM) support for SYCL backend on NVIDIA GPUs.
* Introduced initial support for AMD GPUs via MIOpen library. Supported primitives include Local Response Normalization (LRN), softmax, and eltwise.
Usability
* Added `matmul_perf` example that benchmarks matmul primitive for all supported data types.
* Introduced annotations for JIT kernels to allow profilers like Linux perf to correctly label JIT code.
* Extended verbose logs converter with RNN primitive support.
* Added verbose output for `dnnl_*gemm*` calls.
* Removed Level Zero headers from the list of build time dependencies.
* Adjusted NVIDIA GPU implementation to comply with oneDNN numerical behavior. Implicit downconvert to fp16 and tf32 are now managed via [math mode API](http://oneapi-src.github.io/oneDNN/group_dnnl_api_mathmode.html).

Validation
* Added benchdnn driver for validation of internal BRGEMM implementation.
* Improved benchdnn reference implementation performance with threadpool threading model.
* Extended benchdnn performance benchmarking capabilities on GPU with device-side performance measurement mode (`mode=po`).

Deprecated Functionality
* Support for SYCL 1.2.1 (aka SYCL 2017 standard) is deprecated and will be removed in the future releases.
* Static output scales are deprecated and will be removed in the next release.
* Convolution Winograd algorithm implementation for int8 data type is deprecated and will be removed in the next release.

Breaking Changes
* Changed formula for AUGRU RNN cell to align with Tensorflow. See [proposal](https://github.com/oneapi-src/oneDNN/blob/rfcs/rfcs/20211025-augru/augru-v2.md) for details.

Thanks to the Contributors
This release contains contributions from the project core team as well as Aidan Belton AidanBeltonS, akshatasangelkar, Alex Bojan lb991, Crefeda Rodrigues cfRod, Damian Szwichtenberg dszwicht, Diana Bite diaena, Divakar Mariyanna bmdivakar, Emilio Cota cota, Gordon Fossum austinpagan, Hugh Delaney hdelan, Jacek Czaja jczaja, jakpiase, Jonathan Deakin jondea, Kentaro Kawakami kawakami-k, Kotha Sowmya Sowmyakotha1999, Louie Tsai louie-tsai, Mark Ryan markdryan, MITSUNARI Shigeo herumi, Mona Minakshi monaminakshi, NaNAGISaSA, Nathan John Sircombe nSircombe, Peter Caday petercad, pgorlani, Sreekanth Yalachigere sreekanth-yalachigere, Tadej Ciglarič t4c1, and Thiago Macieira thiagomacieira. We would also like to thank everyone who asked questions and reported issues.

Page 7 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 7 of 27

2.7.5

2.7.4

2.7.3

2.7.2

2.7.1

2.7

Page 7 of 27

Links

Releases