Onednn

Latest version: v2025.1.0

Safety actively analyzes 723954 Python packages for vulnerabilities to keep your Python projects secure.

Page 15 of 27

2.0beta04

This is a preview release for oneDNN v2.0. The release is based on [oneDNN v1.2](https://github.com/intel/mkl-dnn/releases/tag/v1.2).

Binary distribution of this software is available as [Intel(R) oneAPI Deep Neural Network Library](https://software.intel.com/en-us/oneapi/onednn) in [Intel(R) oneAPI]( https://software.intel.com/en-us/oneapi).

Known Limitations
* Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
* Intel Processor Graphics Gen11 is not supported.
* When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

2.0beta03

This is a preview release for oneDNN v2.0. The release is based on [oneDNN v1.1](https://github.com/intel/mkl-dnn/releases/tag/v1.1) and the release notes below include incremental changes.

Binary distribution of this software is available as [Intel(R) oneAPI Deep Neural Network Library](https://software.intel.com/en-us/oneapi/onednn) in [Intel(R) oneAPI]( https://software.intel.com/en-us/oneapi).

New functionality
* SYCL API extensions and interoperability with SYCL code
* Support for Intel DPC++ compiler and runtime

Usability
* SYCL interoperability examples

Known Limitations
* Some f32/f16 convolutions with non-square spatial shape of filters may produce incorrect results on GPU.
* Some bf16 backward convolutions with 3D spatial and negative padding may produce segfault on CPU.
* Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
* RNN primitive may hang on GPU if the number of recurrent cells is bigger than 40.
* int8 RNN may produce incorrect results on GPU.
* Backward propagation of Layer Normalization primitive produces incorrect results.
* Intel Processor Graphics Gen11 is not supported.
* When running GPU kernels that take longer than a certain time (it depends on OS and system settings) you may face a situation resulting in apparent hang of the application. Configure driver to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including DNNL examples.

On Linux:

$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'

On Windows increase TdrDelay and TdrDdiDelay values using registry.

1.8.1

This is a patch release containing the following changes to v1.8:

* Fixed performance regression for fp32 convolutions forward propagation on Intel Processor Graphics and Xe architecture-based Graphics (2c8d20640d5068e2d85e378b266644fe86220e84, d8d6807c9c3b3346ac1045cf2dd88c0aaddfa5ce)
* Fixed segmentation fault for fp32 and bfloat16 convolutions with huge spatial dimensions on processors with Intel AVX2 and Intel AVX512 support (fe8487db3a85e4a497af3bfa7ed96a2a986ce5f6, cb8ef4ed81b8f1f63cf5c5e444dc31add17317fb)
* Fixed correctness issue in depthwise convolution (groups = channels) weight gradient with non-trivial padding and strides on Intel64 processors (b7ffe4859a17c360849018b8b4c187ddcdb64dcc)
* Fixed correctness issue in int8 convolution with 1x1 filter and non-trivial padding on Intel Processor Graphics and Xe architecture-based Graphics (5b4201c2f302ff770593140802f508041973e310)
* Fixed performance regression for dnnl_sgemm, fp32 matmul and inner product on Inte64 processors and improved this functionality performance with threadpool threading (32c1110807b999c9d434a1be8455e42c35124a93)

1.8

Performance optimizations
* Intel Processor Graphics and Xe architecture-based Graphics:
* Improved performance of Winograd convolution.
* Intel Architecture processors
* Introduced initial performance optimizations for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
* Improved performance of int8 primitive for processors with Intel SSE4.1 instruction set support.
* Improved performance of int8 and bfloat16 RNN and Inner Product primitives.
* AArch64-based processors
* Improved performance of Winograd convolution with ArmCL
* Improved performance of int8 convolution with ArmCL
* Added JIT support for Aarch64 and JIT reorder implementation
New Functionality
* Introduced int8 support for LSTM primitive with projection for CPU.

Thanks to the contributors
This release contains contributions from the project core team as well as Alejandro Alvarez, Aleksandr Nikolaev alenik01, Arthur Mitrano aaraujom, Benjamin Fitch, Diana Bite diaena, Kentaro Kawakami kawakami-k, Nathan John Sircombe nSircombe, Peter Caday petercad, Rafik Saliev rfsaliev, yuriFreeBSD yurivict. We would also like to thank everyone who asked questions and reported issues.

1.8rc

This is a release candidate for oneDNN v1.8. Please provide feedback and report bugs in [Github issues](https://github.com/oneapi-src/oneDNN/issues).

1.7

Performance optimizations
* Intel Processor Graphics and Xe architecture-based Graphics:
* Improved performance of convolutions and matmul primitives.
* Improved performance of int8 convolutions for NHWC activations format.
* Intel Architecture processors:
* Improved performance of primitives for NHWC activations format.
* Improved fp32 GEMM performance for small N
* Improved performance of int8 primitives for processors with Intel SSE4.1 instruction set support.
* AArch64-based processors
* Added support for Arm Performance Library (ArmPL). ArmPL provides optimized GEMM implementation for aarch64.
* Added support for [Arm Compute Library (ArmCL)](https://github.com/arm-software/ComputeLibrary). ArmCL provides optimized convolution implementation for aarch64.

New Functionality
* Added support for IBMz (s390x) and IBM POWER (powerpc64) architectures
* Introduced RNN GRU for GPU.
* Introduced int8 RNN GRU for CPU
* Introduced asymmetric quantization support for convolutions and matmul
* Introduced [dilated pooling support](https://oneapi-src.github.io/oneDNN/group__dnnl__api__pooling.html).
* Extended matmul primitive to support multiple dimensions in batch and broadcast on CPU.
* (preview) Introduced [binary post-op](https://oneapi-src.github.io/oneDNN/dev_guide_attributes_post_ops.html) for (de)-convolution, pooling, eltwise, binary, inner product, and matmul.
* (preview) Extended the number of supported post-ops for primitives to 20.
* (preview) Introduced [reduction primitive](https://oneapi-src.github.io/oneDNN/dev_guide_reduction.html) for CPU. Together with post-ops this functionality allows to implement normalization.

Thanks to the contributors
This release contains contributions from the project core team as well as Ben Fitch, Brian Shi, David Edelsohn edelsohn, Diana Bite diaena, Moaz Reyad moazreyad, Nathan John Sircombe nSircombe, Niels Dekker N-Dekker, Peter Caday petercad, Pinzhen Xu pinzhenx, pkubaj pkubaj, Tsao Zhong CaoZhongZ. We would also like to thank everyone who asked questions and reported issues.

Page 15 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 15 of 27

2.0beta04

2.0beta03

1.8.1

1.8

1.8rc

1.7

Page 15 of 27

Links

Releases