This is a release candidate for oneDNN v3.5. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).
Performance Optimizations
* Intel Architecture Processors:
* Improved performance for 4th generation Intel Xeon Scalable processors (formerly Sapphire Rapids).
* Improved performance for the future Intel Xeon Scalable processors (code-named Sierra Forest and Granite Rapids).
* Improved performance of group normalization primitive.
* Improved performance of matmul primitive with sum post-op for batched cases on processors with Intel AMX instruction set support.
* Improved performance of the following subgraphs with Graph API:
* Multi-Query Attention (MQA).
* Scaled Dot Product Attention (SDPA), including the variant with `select` operation.
* `LayerNorm` + `Multiply` + `Quantize` produced by SmoothQuant algorithm.
* `Convolution` + `Sigmoid` + `Multiply` with mixed precisions.
* Intel Graphics Products:
* Improved performance for Processor Graphics based on Xe2 architecture.
* Improved performance for the Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* Improved performance for Intel Arc graphics (formerly Alchemist and DG2) and the Intel Data Center GPU Flex Series (formerly Arctic Sound).
* Improved RNN primitive performance for LSTM cell case.
* Improved performance of f8_e4m3 data type emulation on Intel Data Center GPU Max Series (formerly Ponte Vecchio).
* AArch64-based Processors:
* Improved convolution forward propagation, matmul, and softmax performance for processors with SVE support.
* Improved bf16 matmul performance with Arm Compute Library (ACL).
* Improved eltwise primitive performance with `gelu_erf` algorithm with ACL.
Functionality
* Introduced sum and binary post-ops support for layer normalization primitive. This functionality is currently implemented on CPUs only.
* Introduced support for int4 data type and extended quantization model with support for grouped scales and zero points.
* Introduced fp64 matmul support. This functionality is currently implemented on Intel GPUs only.
* Extended [floating point math mode API](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html) to support weight decompression scenarios. See [matmul weights decompression example](https://github.com/intel-innersource/libraries.performance.math.onednn/blob/main/examples/tutorials/matmul/weights_decompression_matmul.cpp) to get started. New floating mode is supported in the following configurations:
* bfloat16 matmul with int8 weights on Intel CPUs.
* float16 and bfloat16 matmul with int8 or int4 weights on Intel GPUs.
* **[experimental]** Introduced [microkernel API](https://oneapi-src.github.io/oneDNN/ukernels.html) for Intel Architecture Processors. This API exposes internal mechanisms used in matmul and convolution implementation to expert users.
Usability
* Extended error messages for engine and memory objects creation errors.
* Extended verbose mode diagnostics with information on dispatching decisions for all primitives.
* Introduced support for clang++ host compiler in SYCL builds.
* Introduced API for tensor serialization and deserialization.
* Extended verbose mode diagnostics for Graph API with information on pattern matcher decisions.
* Introduced OpenCL runtime support for Graph API.
* Added support for building oneDNN with installed Arm Compute Library (ACL).
Validation
* Extended benchdnn with support for tensor tags in RNN primitive validation.
Thanks to these Contributors
This release contains contributions from the project core team as well as AngryLoki, Crefeda Rodrigues cfRod, Daniel Richard G. iskunk, deepeshfujitsu, Dylan Angus dylan-angus-codeplay, Emanuele Rocca ema, Hernan Martinez hmartinez82, John Osorio kala855, Jonathan Deakin jondea, kasturedeeksha, Kentaro Kawakami kawakami-k, Nikita Shulga malfet, Radu Salavat Radu2k, Renato Barros Arantes renato-arantes, Roman Zhukov rozhukov, Shreyas-fuj Shreyas-fuj, Sunita Nadampalli snadampal, Tadej Ciglarič t4c1, Vineel Abhinav vineelabhinav, vishwascm. We would also like to thank everyone who asked questions and reported issues.