Performance Optimizations
* Extended primitive cache to improve primitive descriptor creation performance.
* Improved primitive cache performance in multithreaded configurations.
* Intel Architecture Processors
* Introduced initial optimizations for bfloat16 compute functionality for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).
* Improved performance of binary primitive and binary post-op for cases with broadcast and mixed source and destination formats.
* Improved performance of reduction primitive.
* Improved performance of depthwise convolution primitive with NHWC activations for training cases
* Intel Graphics Products
* Improved fp32 and fp16 Winograd convolution performance.
* Introduced support for automatic selection between direct and Winograd convolution algorithms.
* Improved int8 depthwise convolution performance.
* Improved performance of reorder, shuffle, concat, binary, and batch normalization primitives
* Improved layer normalization performance for blocked formats.
* AArch64-based Processors
* Improved reorder primitive performance for systems with SVE 128 and SVE 256 support.
* Improved eltwise primitive performance for systems with SVE 512 support.
Functionality
* Extended [batch normalization](https://oneapi-src.github.io/oneDNN/dev_guide_batch_normalization.html) and [layer normalization](https://oneapi-src.github.io/oneDNN/dev_guide_layer_normalization.html) primitives API to take separate scale and shift arguments.
* Extended [resampling](https://oneapi-src.github.io/oneDNN/dev_guide_resampling.html) primitive with post-ops support and mixed source and destination data types.
Usability
* Introduced binary distribution in [conda-forge](https://github.com/conda-forge/onednn-feedstock). Supported configurations cover Linux, Windows, and macOS operating systems and Intel64/AMD64, Aarch64, and PPC64 architectures.
* Introduced support for GPU-only build. This configuration helps to reduce binary footprint for applications targeting GPU.
* Introduced an option to use GNU OpenMP as CPU runtime for DPC++ configuration.
* Introduced [verbose log converter](https://github.com/oneapi-src/oneDNN/tree/master/scripts/verbose_converter). This tool processes [oneDNN verbose logs](https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html) and generates test cases for benchdnn.
Breaking Changes
* Updated minimal supported CMake version from to 2.8.12 (was 2.8.11).
* Updated minimal supported ACL version from 21.05 (was 21.02).
Thanks to the Contributors
This release contains contributions from the project core team as well as Alexandre Truong aletru01, Arthur Mitrano aaraujom, fitchbe fitchbe, Isuru Fernando isuruf, Joe Ramsay joeramsay, Kentaro Kawakami kawakami-k, leizheng1 leizheng1, Nomoto Kazuhiro NomotoKazuhiro, Peter Caday petercad, Pablo Romero pablocum, Takumi-H Takumi-Honda, Uwe L. Korn xhochy, Vasily Rubtsov vasilyru. We would also like to thank everyone who asked questions and reported issues.