Onednn

Latest version: v2025.1.0

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Page 10 of 27

2.4.4

This is a patch release containing the following changes to v2.4.3:
* Fixed incorrect results for reorder with zero-points on CPUs (ee63629a52dfb355d9f93ed5ac0c72440ad24b65)
* Fixed an issue with reorder with zero-points not respecting rounding mode on processors without Intel DL Boost support (a165c4a5284820b341d7e5e114f0de689b87f38f)
* Fixed correctness issue in bfloat16 inner product weight gradient on processors with Intel DL Boost support (b782f190bd40f5b3bbae2d66c5922cc8793f236c)
* Improved bfloat16 inner product weights gradient performance on processors with Intel AMX support (ebf9f817f12c2db4447eb4000aadab411879fe36)
* Fixed potential undefined access in convolution, inner product, matmul, and RNNs primitives on processors with Intel AMX support (dcd98ad372108f6fa33510b6e9acd7bae491df83)

graph-v0.3
This is a technical preview for oneDNN Graph API based on [oneDNN v2.4](https://github.com/oneapi-src/oneDNN/releases/tag/v2.4).

Functionality
* Introduced int8 inference support.
* Updated API to comply with [oneDNN Graph API specification v0.8](https://spec.oneapi.io/onednn-graph/latest/index.html).

Known Issues and Limitations
* Some subgraphs might not be recognized as a partition even if it matches the general pattern description due to internal implementation.
* The weight’s opaque layout can be queried only from a compiled partition, which requires that tensor shapes must be known at compilation time.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Tian Feng, Zhang Guoming, Jiong Gong, Chunyuan Wu, Nishant Patel, Yiqiang Li, Yang Sheng, Yunfei Mao, Kiefer Kuah and others.

2.4.3

This is a patch release containing the following changes to v2.4.2:
* Fixed and issue with reorder primitive producing `NaN` results for some cases on future Intel Xeon Scalable processor (code name Sapphire Rapids) (ac20af36ea3aced6a1dafe83447ffaee424fefa3)
* Fixed performance regression for inner product primitive for future Intel Xeon Scalable processor (code name Sapphire Rapids) (ac6a24d4d2db987665bb0832ea31b3f11ca42844, 2cf3526fdbb246b51db597ac730268a6391f87e2, d02dddf7b35ca4ea3c63a9d4aaa9a7c4be2d1cd8, bcdc17531179cc414c3d6e858f6708938ebcb7bb)
* Fixed segmentation fault in int8 deconvolution primitive with asymmetric quantization for processors with Intel AVX-512 support (6ba086ae6be066222d46693803aefeacecf3ed0b)

2.4.2

This is a patch release containing the following changes to v2.4.1:
* Fixed performance regression for convolution primitive for the shapes with 3D spatial for future Intel Xeon Scalable processor (code name Sapphire Rapids) (aca0af1d192ef40ec93b45d24edd509e2905156d)
* Fixed segmentation fault in bfloat16 forward and backward inner product primitive or future Intel Xeon Scalable processor (code name Sapphire Rapids) (ae8cf18e35a8ed069a0898d77f5548781ecf5e4b, 3de9549d5817ea50638ad208293bb2e2fbb15048)
* Fixed reorder primitive with compensation (6ba086ae6be066222d46693803aefeacecf3ed0b)
* Fixed issue in scratch pad size calculation for BRGEMM-based convolutions (dd9eceb88ed683806fb853a33bfab71982414fa3)

2.4.1

This is a patch release containing the following changes to v2.4:
* Reduced scratch pad size requirements for BRGEMM-based convolutions (a81ce3cce3a82ad074d1cdc50b73d4104910c1e0)
* Worked around an issue with the number of threads detection on AMD processors (ad901e5489564d0035be0b4ec41f1cff4be96610)

2.4

Performance Optimizations
* Improved primitive cache performance for Intel Graphics products.
* Intel Architecture Processors
* Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).
* Improved binary primitive performance for cases when one of the tensors is broadcasted.
* Improved performance of reduction primitive, reorder, shuffle primitives.
* Improved performance of depthwise convolution forward propagation for processors with Intel AVX5-12 support
* Improved performance of forward inner product primitive for the shapes with minibatch equal to 1 for processors with Intel AVX-512 support
* Improved performance of int8 matmul and inner product primitives for processors with Intel AVX2 and Intel DL Boost support
* Intel Graphics Products
* Introduced initial optimizations for future Intel Arc graphics (code name Alchemist and DG2).
* Improved performance of convolution and deconvolution primitives with new JIT convolution kernel generator implementation. These optimizations are identified by `jit:ir` marker in oneDNN verbose log.
* AArch64-based Processors
* Added support for bfloat16 acceleration with Arm Compute Library (ACL). The behavior is controlled by [floating point math mode API]( https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html#doxid-dev-guide-attributes-fpmath-mode).
* Improved inner product, matmul, and eltwise primitives performance with ACL.
* Introduced support for sum and for indirect and Winograd convolution implementations with ACL.
* NVIDIA Graphics
* Improved convolution performance with eltwise post-op.

Functionality
* Introduced [PReLU post-op](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_post_ops.html) support in convolution and matmul.
* Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
* Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
* Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
* Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
* Added Unified Shared Memory (USM) support for OpenCL GPU runtime.

Usability
* Added compile time options to manage the set of supported primitives and workload types. See `DNNL_ENABLE_WORKLOAD` and `DNNL_ENABLE_PRIMITIVE` in [build options](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html) for more details. This feature allows to reduce binary footprint of the library for specialized applications.
* Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading and GPU only configuration. Note, that binary footprint depends on the compiler used to build the library and build options.
* Introduced [floating point math mode API]( https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html#doxid-dev-guide-attributes-fpmath-mode). The API allows the library to use bfloat16 or float16 hardware acceleration in fp32 operations. Currently this mode is supported only on AArch64 processors when oneDNN is built with ACL.
* Added a build option DNNL_LIBRARY_NAME to change the library name and CMake target. This feature helps projects that use multiple oneDNN configurations.

Breaking Changes
* Updated minimal supported ACL version to 21.08 (was 21.05).

Deprecated functionality
* Intel MKL-DNN compatibility API is deprecated and will be removed in the next update. See [Transition from Intel MKL-DNN to oneDNN](https://oneapi-src.github.io/oneDNN/dev_guide_transition_to_dnnl.html) page for instructions on moving to new API.
* Support for Intel Xeon Phi processors is deprecated and will be removed in the next release.

Thanks to the Contributors
This release contains contributions from the project core team as well as
Aleksandr Nikolaev alenik01, Arthur Mitrano aaraujom, Crefeda Rodrigues cfRod, Diana Bite diaena, Jing Xu jingxu10, Kentaro Kawakami kawakami-k, Kevin Putnam intelkevinputnam, Mesut Meterelliyoz mmeterel, MITSUNARI Shigeo herumi, Nathan John Sircombe nSircombe, Nicolas Chauvet kwizart, Peter Caday petercad. We would also like to thank everyone who asked questions and reported issues.

2.4rc

This is a release candidate for oneDNN v2.4. Please provide feedback and submit defect reports via [Github issues](https://github.com/oneapi-src/oneDNN/issues/new/choose).

Performance Optimizations
* Improved primitive cache performance for Intel Graphics products.
* Intel Architecture Processors
* Improved performance for future Intel Xeon Scalable processor (code name Sapphire Rapids). The functionality is disabled by default and should be enabled via [CPU dispatcher control](https://oneapi-src.github.io/oneDNN/dev_guide_cpu_dispatcher_control.html).
* Improved binary primitive performance for cases when one of the tensors is broadcasted.
* Improved reorder primitive performance for memory formats with padding and/or zero points.
* Intel Graphics Products
* Introduced initial optimizations for future Intel Arc graphics (code name Alchemist and DG2).
* AArch64-based Processors
* Improved inner product and eltwise primitives performance with ACL.
* Introduced support for sum and for indirect and Winograd convolution implementations with ACL.
* NVIDIA Graphics
* Improved convolution performance with eltwise post-op.

Functionality
* Introduced [PReLU post-op](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_post_ops.html) support in convolution and matmul.
* Extended maximum allowed post-ops chain for compute primitives (convolution, deconvolution, inner product, and matmul) to 32.
* Introduced support for zero points in sum post-op for convolution and matmul. The functionality is implemented only for CPUs.
* Extended binary primitive with support for mixed data types for input tensors. The functionality is implemented only for CPUs.
* Extended sum post-op for convolution and matmul primitives with support for mixed data types. The functionality is implemented only for CPUs.
* Added USM support for OpenCL GPU runtime.

Usability
* Added compile time options to manage the set of supported primitives and workload types. See DNNL_ENABLE_WORKLOAD and DNNL_ENABLE_PRIMITIVE in [build options](https://oneapi-src.github.io/oneDNN/dev_guide_build_options.html) for more details. This feature allows to reduce binary footprint of the library for specialized applications.
* Reduced overall library size by trimming down use of templates, OpenCL headers, and TBB headers. The configurations that benefitted the most are CPU only configuration with TBB threading and GPU only configuration. Note, that binary footprint depends on the compiler used to build the library and build options.
* Introduced [floating point math mode API]( https://oneapi-src.github.io/oneDNN/dev_guide_attributes_fpmath_mode.html#doxid-dev-guide-attributes-fpmath-mode). The API allows the library to use bfloat16 or float16 hardware acceleration in fp32 operations. Currently this mode is not supported in the implementation.
* Added a build option DNNL_LIBRARY_NAME to change the library name and CMake target. This feature helps projects that use multiple oneDNN configurations.

Breaking Changes
* Updated minimal supported ACL version from 21.08 (was 21.05).

Deprecated functionality
* Intel MKL-DNN compatibility API is deprecated and will be removed in the next update. See [Transition from Intel MKL-DNN to oneDNN](https://oneapi-src.github.io/oneDNN/dev_guide_transition_to_dnnl.html) page for instructions on moving to new API.

Thanks to the Contributors
This release contains contributions from the project core team as well as
Aleksandr Nikolaev alenik01, Arthur Mitrano aaraujom, Diana Bite diaena, Jing Xu jingxu10, Kentaro Kawakami kawakami-k, Kevin Putnam intelkevinputnam, MITSUNARI Shigeo herumi, Nathan John Sircombe nSircombe, Nicolas Chauvet (kwizart) kwizart, Peter Caday petercad. We would also like to thank everyone who asked questions and reported issues.

graph-v0.2
This is a technical preview for oneDNN Graph API based on [oneDNN v2.3.2](https://github.com/oneapi-src/oneDNN/releases/tag/v2.3.2).

oneDNN Graph API extends oneDNN with a unified, high-level graph API for multiple AI hardware classes (CPU, GPU, accelerators). The graph interface integrates with the deep learning frameworks and inference engines to maximize opportunities for performance optimizations across a variety of hardware targets. This preview has full support for the oneAPI Graph programming model and partial support of the operations in [oneDNN Graph API specification v0.7](https://spec.oneapi.io/onednn-graph/latest/introduction.html).

Learn more about oneDNN Graph API:
* [Introduction to oneDNN Graph API](https://github.com/oneapi-src/oneDNN/blob/dev-graph-preview2/doc/README.md)
* [Getting started with C++ API](https://github.com/oneapi-src/oneDNN/blob/dev-graph-preview2/doc/cpu_get_started.md)
* [Getting started with DPC++ API](https://github.com/oneapi-src/oneDNN/blob/dev-graph-preview2/doc/sycl_get_started.md)

Supported Functionality
* C++ and DPC++ API.
* Graph partition and compilation API.
* Operations and fusions targeting fp32 inference for CNNs, MLPs, and transformer neural networks.

Performance Optimizations
Backend implementation relies on oneDNN and includes performance optimizations for Intel Architecture processors with Intel SSE4.1, Intel AVX, Intel AVX2, or Intel AVX512 instruction set.

Validation
* [Gtest suite](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2/tests) is available for basic functional testing.
* Comprehensive functional and performance validation is covered by the extended version of [benchdnn](https://github.com/oneapi-src/oneDNN/tree/dev-graph-preview2/tests/benchdnn).

Known Issues and Limitations
* Some subgraphs might not be recognized as a partition even if it matches the general pattern description due to internal implementation.
* The weight’s opaque layout can be queried only from a compiled partition, which requires that tensor shapes must be known at compilation time.
* Binary operation with scalar and tensor inputs is not optimized.

Thanks to the Contributors
This release contains contributions from the project core teams as well as Jiong Gong, Pinzhen Xu, Chunyuan Wu, Jianping Chen, Scott Cyphers, Nishant Patel, Yiqiang Li, Yang Sheng, Kiefer Kuah, Adam Straw, Tim Zerrell, Namrata Choudhury and others.

Page 10 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 10 of 27

2.4.4

2.4.3

2.4.2

2.4.1

2.4

2.4rc

Page 10 of 27

Links

Releases