This is a patch release containing the following changes to v3.0:
* Fixed potential correctness issue in convolution weight gradient with 1x1 filter and strides (e58996692802f4a94651f6baa6e3f0debf93b537)
* Improved convolution, deconvolution, inner product, and matmul primitives performance with scales on Intel CPUs (38319f1f822387bd755183bcac2ec3d0745a88b4, 18de927dc205543701942f0f26d61f72c51f5f0b, b6170d1b79332d8ba0f72227cb5edd2aced837c0, 85171b0cc057d5ba682dee582cd72c48543389db)
* Reverted MEMFD allocator in Xbyak to avoid fails in high load scenarios (eaaa41b8a30101640094e46af7f27969ed105ee2)
* Fixed array out of bounds issue in `bfloat16` convolution weight gradient on Intel CPUs (a17a64c330d1153fdea3d81f1420fb38c50248bd)
* Improved compatibility with future versions of Intel GPU driver (eb7a0a07df12874a40c0f135d8bf16116594e0e8)
* Fixed segfault in `fp16` and `bfloat16` convolution backward propagation on systems with Intel AMX support (293561b6a2644ef05d8d664cd81c1bcde876b481)
* Fixed build issue with GCC 13 (1d7971ce488da657e23f08488cdb6ef8e484c5e8)
* Fixed correctness issue in `int8` RNN primitive Vanilla GRU flavor on Intel CPUs (f4a149c16faff0fb51fb292d12a7b51f6fac53bf, fbf8dca1ba9b565ddedd1cb291d3b466d0a5a45b)
* Added check for unsupported arguments in binary primitive implementation for AArch64-based processors (5bb907077cd7b4c3983f7215d5509b17f3da67e2)
* Fixed correctness issue in `int8` convolution with zero-points on Intel Data Center GPU Max Series (96e868c473bb0e2a9b1a42b51e8f91997b52b471)
* Fixed runtime error in convolution primitive with small number of channels on Xe-based graphics (068893e1c792c8e9ad5b17bc6e494359b32f910f)
* Removed use of OpenCL C variable length arrays in reduction primitive implementation for Intel GPUs (41e8612f212d939643932ef309cd78bd4194f42d)
* Fixed correctness issue in matmul and inner product primitives on Intel Data Center GPU Max Series (a1e6bc57b233d85a6f382db611879614236d9b05, dbb7c284e0834cd0fe84c8311484880802fa9af0)
* Fixed segfault in `fp16` and `bfloat16` convolution backward propagation on future Intel Xeon processors (code name Sierra Forest) (399b7c5af4c5238f9956d71270adbd44f3cb25a3)
* Fixed runtime error in Graph API for partitions with quantized matmul and add operations (f881da5be31abc71f90a1a750c50ec2ea5dbc516, 699ba755fde86aea3714bbce75d5b0b274302545, b8d21a58d8247097ed26816b730e3cd4c19f61c, 9421fb2a453aee957a0c1dc10be5675e5f916c2e)
* Fixed convolution performance regression on Xe-based graphics (1869bf26a92f8d8f36853e537f9727412a4d1f94)
* Improved convolution performance with `OHWI` and `OIHW` weight formats on Intel Data Center GPU Max Series (2d0b31ee82dc681b829f67100c05ae4e689633e6, 5bd5d52e7ee832fb0d5ece6d42a6b230023c9dd0)
* Fixed include files handling in build system affecting CMake projects relying on oneDNN (c61645392fde55ac361c95a752df0cfa7ef24345)
* Added `tbb::finalize` to tests and examples to address intermittent test crashes with TBB runtime (891a41560382cc0f991c428392078d13ccb76129, c79e54322f251aa70783ca1b837ce0d558bf3396, 8312c3addc597e6565cf1233801234c2ffafd092, 1a32b95a2c61d094206ed49d69843fdcdeb2ffcd, bd0389d81509baf6696d3927d0da4cce4c06d2d4, f05013d0e419df22ec2755dc5d74f5974871cf9e, ab7938f1b889aa43f155216f774297e8c765cd97, 31c9e7b3c1a7e262cecafe98bed128843f1c2969, f3261e4556935424946697be4b336020653b41a5, d58ac41a12179f8cca48962c4b5a44940bea97d7, f8c67b9026dc2945ed66a8f1c276611c063dae4d, 258849b71c24a89b08ac12972ec1fcaa72a9da39, b20a8c786c5a2cb676a2a8b599edf5cfd7ee0c3a)
* Fixed segfault in `fp16` convolution primitive on future Intel Xeon processors (code name Granite Rapids) (a574ffff870318cc104d8af4a2368d47b433b27f)
* Fixed correctness issue in `fp16` convolution primitive on future Intel Xeon processors (code name Sierra Forest) (f165ed8a8872e72a7d9651c3dd38bd6c2909fdce)
* Fixed correctness issue in `int8` convolution primitive on Intel CPUs (ca1592237b87cae5e4a55fb464ad90fb9f91957d, 27845b8e66d354549ac6c6fceeb92c267a9e910f)
* Fixed correctness issue in `int8` convolution primitive on Intel Data Center GPU Max Series (8bb651cb99e2875aea44b907bdc54418b2d4932a)
* Fixed correctness issue in resampling primitive with post-ops on Intel CPUs (aa52a5128d44c6d745b89beabcd47f428665843e)
* Addressed excessive memory consumption in 3D convolution on Intel CPUs (3d6412af5cb99863ede8753238533dcabcd3c5d9, 097acb5e108eb57b38a8a2409b083a1819b9f962, fd696639c70c4cd92e2aaf871bc4165c269d29f7)
* Fixed segfault in convolution with `sum` and `relu` post-ops on Intel CPUs (63ad769939dd8307935caac67c0fc7c9bc9206de, 1b1303748b80360e5f93740d6ea03063132fd8f8, 0a8116b3de98243a234680d8cda869d2f20dd178, 9972cb80a29da9f14efbe8518bc10a21f7ae6e36)
* Addressed convolution performance regression with small number of channels on Intel GPUs (d3af87710fcae9561ae22017d45bd670f8858272)
* Worked around MSVS 2019 bug resulting in build fails on Windows (40247753290e3e886b9235c5f80a2997eb85372a)
* Updated code base formatting to clang-format 11 (23576f935fcef245b26cc78ef74935ea6bb7e6b7, 0b1bf845e05da75e4d994e01a0d7996b64787ece)
graph-v0.8.1
This is a patch release containing the following changes to [graph-v0.8](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.8):
* Upgraded oneDNN dependency from v2.7.2 to v2.7.3 (93237aa, 260bdb5)
* Fixed a correctness issue of quantized Convolution + Add fusion (26a9a5b, beba352)
* Fixed `query_dynamic_outputs()` interface implementation in graph compiler backend (8dbca04)