Onednn

Latest version: v2025.0.0

Safety actively analyzes 688823 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 12 of 26

2.1.3

This is a patch release containing the following changes to v2.1.2:
* Updated xbyak_aarch64 to support Apple silicon (dd1a02ab2a962bbeadfc0d2e53fedf39ed2b7b7e, 913010b253eccd4654c29f78c81227f7342e3262, 2d155dd22c59f4a059e9a7903c503d2221542811)
* Fixed segfault in fp32 depthwise convolution with padded memory (2d8283f575d0a0a43a8a967f659f95e2fd8dd866)
* Fixed potential issues in BRGEMM-based convolution implementation (b183dffa0fefa2c342070daae95c00ff274e8310, d2b1653f28f35ea3dc93c10ba6b9b538e80ba08e)
* Fixed memory leak on NVIDIA GPUs (06803f2c2834b67a357fdb24d03ea906b9ffdd3a)

2.1.2

This is a patch release containing the following changes to v2.1.1:
* Improved performance of forward convolution with plain activations for processors with Intel AVX-512 support (2147a58a6b075edcbb8b03fb158a73b7e706c324)
* Enabled I-cache refreshing before executing JIT-ed code for AArch64 systems (9f3bc1c9279dde44383ef476ae49e813142b3cdc)
* Returned blocked layouts as default for forward training (7af2898e65136ad2dd8cfc280027428e3ef2ec72, bd4826d8f098d196a9502d0c6d347f0956a243ad)

2.1.1

This is a patch release containing the following changes to v2.1:
* Improved performance of fp32 depthwise convolution with plain activations on CPU (762a9c75a01476457d705c1e98f4d28f74b80e4d)
* Worked around internal compiler error in GCC 7.3.1 when building with `--std=c++14` (f637501d41e0d9a1515430a5530fca53fe656903)
* Fixed memory leaks in batchnorm and gemm implementations (2ea5385402c2b3d6995b9e6bb8cb773339d9b7c2, 4f3a7cf1bc3009415a2cd065ffe2ed4ed45fda6c)
* Addressed several issues in benchdnn and gtests (bb7bdb41e13ff47d7993e29827b3e60697c4809a, 0e04cc29a09eacc81d9e0dd705b55381b19166ea, d7df8d2240ea0c4d5ce74a209ccf652dd7094570, a59354fad484c46dd98956c406534d371d3fd08e)

2.1

Performance optimizations
* Reduced overheads associated with [primitive cache](https://oneapi-src.github.io/oneDNN/dev_guide_primitive_cache.html).
* Intel Processor Graphics and Xe architecture-based Graphics:
* Improved performance of Winograd convolution.
* Improved functionality performance for padded memory formats.
* Improved performance of reorder and shuffle primitives for multiple formats and all dimensions.
* Improved performance of pooling primitive for float16 data type.
* Improved performance of lnorm primitive for plain formats.
* Improved performance of resampling primitive for blocked formats.
* Intel Architecture processors
* Introduced initial optimizations for bfloat16 functionality for future Intel Xeon Scalable processor with Intel AMX support (code name Sapphire Rapids).
* Improved performance of int8 and bfloat16 RNN and inner product primitives.
* Improved performance of shuffle primitive for bfloat16 data type.
* Introduced [CPU ISA hints]( https://oneapi-src.github.io/oneDNN/dev_guide_cpu_isa_hints.html) environment variable and API. New API is intended to dispatch function implementations using YMM registers to improve performance on processors with a single Intel AVX512 compute unit.
* Improved forward convolution performance for Intel AVX-512 systems.
* Introduced initial performance optimizations for future Intel Core processor with Intel AVX2 and Intel DL Boost instructions support (code name Alder Lake).
* Improved performance of int8 primitive for processors with Intel SSE4.1 instruction set support.
* Improved convolution and batch normalization performance with threadpool.

* AArch64-based processors
* Improved performance of Winograd convolution with ArmCL.
* Improved performance of int8 convolution with ArmCL.
* Added JIT support for Aarch64 and JIT implementations for reorder, eltwise, pooling, and batch normalization primitives.
* NVIDIA GPUs
* (preview) Introduced support for [NVIDIA GPU]( https://github.com/oneapi-src/oneDNN/blob/master/src/gpu/nvidia/README.md). The implementation relies on DPC++ Compiler, cuDNN, and cuBLAS libraries.

New Functionality
* Introduced int8 support for LSTM primitive with projection for CPU.
* Introduced binary post-op for (de)-convolution, pooling, eltwise, binary, inner product, matmul and reduction (GPU only) along with performance optimizations for CPUs and GPUs.
* Extended the number of supported post-ops for primitives to 20.
* Extended [eltwise primitive](https://oneapi-src.github.io/oneDNN/dev_guide_eltwise.html) with support for `logsigmoid` and `clip_v2` algorithms.
* Introduced support for [PRelu primitive](https://oneapi-src.github.io/oneDNN/dev_guide_prelu.html).
* Extended matmul implementation with support for per-output channel zero-points for quantization.
* Extended support for broadcasting in binary primitive to both inputs for CPU.
* Introduced float16 support in reduction primitive for GPU.
* Introduced support for mixed input and output types in binary primitive for GPU.

Usability
* Added API to enable displaying timestamps in [oneDNN verbose mode]( https://oneapi-src.github.io/oneDNN/dev_guide_verbose.html). Timestamps allow to use oneDNN verbose output in profiling tools.

Validation
* Extended benchdnn to report operation bandwidth.
* Added ability to choose target GPU in benchdnn.

Thanks to the contributors
This release contains contributions from the project core team as well as Alejandro Alvarez, Aleksandr Nikolaev alenik01, araki.kenichi qnet-araki, Arthur Mitrano aaraujom, Benjamin Fitch, Ben Tracy CodeplayBen, Daniel Soutar danielsoutar, dylan-angus-codeplay, Diana Bite diaena, higuchi.motoko higuchi-motoko, Jacob Kahn jacobkahn, Kentaro Kawakami kawakami-k, Kumudha KN KumudhaN, kurihara Koji-Kurihara, Mehdi Goli mehdi-goli, Nathan John Sircombe nSircombe, Peter Caday petercad, Rafik Saliev rfsaliev, Xinyu Chen xinyu-intel, yuriFreeBSD yurivict. We would also like to thank everyone who asked questions and reported issues.

2.1rc

This is a release candidate for oneDNN v2.1. Please provide feedback and report bugs in [Github issues](https://github.com/oneapi-src/oneDNN/issues).

2.0

This is a major oneDNN release based on [oneDNN v1.7](https://github.com/oneapi-src/oneDNN/releases/tag/v1.7).

Binary distribution of this software is available as [Intel(R) oneAPI Deep Neural Network Library](https://software.intel.com/en-us/oneapi/onednn) in [Intel(R) oneAPI]( https://software.intel.com/en-us/oneapi).

Breaking API changes
* OpenCL API:
* OpenCL interoperability API moved to `dnnl_ocl.hpp`.
* Engine, stream, and memory are created from corresponding CL objects using free functions.
* Threadpool
* Threadpool API is moved to `dnnl_threadpool.hpp`.
* Stream object for threadpool is created using free function `dnnl::threadpool_interop::make_stream`.
* Removed stream attributes.

New Functionality
* Introduced [SYCL API extensions](https://oneapi-src.github.io/oneDNN/v2/dev_guide_dpcpp_interoperability.html) compliant with [oneAPI specification v1.0](https://spec.oneapi.com/versions/latest/elements/oneDNN/source/index.html).
* Introduced support for [Intel(R) DPC++ Compiler and Level Zero runtime](https://oneapi-src.github.io/oneDNN/v2/dev_guide_build_options.html).
* Introduced Unified Shared Memory (USM) support for Intel Processor Graphics and Xe architecture-based graphics.

Known Issues and Limitations
* Pooling, batch normalization, and binary primitives may segfault when executed on Xe architecture-based graphics. No workaround available.
* Non-Intel GPUs are not supported. The library API allows to create a DNNL engine by index (the order of devices is determined by the SYCL runtime), and there is no check for GPU devices being non-Intel. To have more control, users can create a DNNL engine passing SYCL device and context explicitly.
* When running GPU kernels that take longer than a certain time (it depends on OS and system settings), you may face a situation resulting in apparent hang of the application. There are ways to configure driver or system settings to disable this timeout and avoid hanging of DPC++ or OpenCL programs, including oneDNN examples:
o On Linux* (See more details at OpenCL™ Driver for Intel® HD, Iris™, and Iris™ Pro Graphics for Linux):
$ sudo bash -c 'echo N > /sys/module/i915/parameters/enable_hangcheck'
o On Windows* (See more details at Timeout Detection and Recovery (TDR) Registry Keys):
Increase TdrDelay and TdrDdiDelay values in registry
* See DPC++ limitations that impact the library as well.

Page 12 of 26

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.