Onednn

Latest version: v2025.1.0

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 25 of 27

0.17

Performance optimizations
* Improved int8 convolutions performance on processors with Intel® AVX512-DL Boost instruction set support.
* Improved performance of fp32 convolutions with number of input and output channels not divisible by the SIMD width for processors with Intel® AVX2 instruction set support.
* Improved performance of Recurrent Neural Networks (RNNs) functionality.
* Improved performance of int8 deconvolution.
* Added optimizations for fp32 inference and training for processors with Intel® AVX instruction set support.
* Added optimizations for convolutions and auxiliary primitives with 3D spatial data for processors with Intel® AVX2 instruction set support.
* Improved int8 Winograd convolution performance for real-time inference use cases.

New functionality
* Introduced int8 data-type support for inner-product primitive.
* Introduced support for int8 convolutions with signed input and signed weights.
* Introduced 1D spatial data support in convolution and auxiliary primitives. This functionality is optimized for processors with Intel® AVX512 instruction set support.
* Introduced the Shuffle primitive.
* Introduced a general-purpose matrix-matrix multiplication function for int8 data (gemm_s8u8s32 and gemm_s8s8s32).
* Feature preview: Threading Building Blocks (TBB) support.

API deprecations and breaking changes
* Order of the gates for LSTM cells was changed to input, forget, candidate, output. This might produce incorrect results.
* Backward RNN primitive creation without the hint in C++ is deprecated.
* Int8 Winograd convolution behavior with respect to scales is aligned with the direct convolution algorithm.

Usability improvements
* Primitives now accept tensors with 0 for the dimension and do nothing in that case.
* Added support for clang sanitizers.
* Build system extended with the following capabilities:
* Allow building with static Intel MKL by passing `-DMKLDNN_USE_MKL=FULL:STATIC` to cmake
* Allow specifying the Intel MKL to use by passing `-DMKLDNN_USE_MKL={DEF,NONE,ML,FULL}` to cmake for that
* Allow using the compiler's OpenMP RT by passing `-DMKLDNN_THREADING=OMP:COMP` to cmake for that
* Allow building a static library by passing `-DMKLDNN_LIBRARY_TYPE=STATIC` to cmake

Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers as well as Dmitry Baksheev dbakshee, Yuta Okamoto okapies, and Eduardo Gonzalez wmeddie. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

0.17rc

This is a release candidate package for MKL-DNN v0.17. It is made available for testing by the community. Please provide feedback and report bugs in [Github issues](https://github.com/intel/mkl-dnn/issues).

0.16

Performance optimizations
* Improved performance of int8 convolutions with number of input and output channels not divisible by SIMD width on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
* Winograd convolutions optimized for fp32 real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
* Optimized weights update of dilated convolutions for fp32 data type on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support.
* Improved performance of reorder primitive for int8 data type.

New functionality
* Added dilation support for deconvolution (transposed convolution) primitive.
* Introduced deconvolution (transposed convolution) primitive for int8 data type.

API deprecations and breaking changes
* The default behavior of gemm-based convolutions was changed. Now they use internally allocated thread-local scratchpad memory for im2col and col2im operations, weights reduction, and accumulation. This may cause correctness issues when multiple gemm-based convolutions are created in one thread and executed concurrently in different threads. To support concurrent execution, MKL-DNN library must be configured with `-DMKLDNN_ENABLE_CONCURRENT_EXEC=TRUE` CMake flag.

Usability improvements
* Extended documentation with [details on MKL-DNN memory formats](http://intel.github.io/mkl-dnn/understanding_memory_formats.html).

Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Yasser Zamani Yasserzamani and Loo Rong Jie Rongjiecomputer. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

0.15

Performance optimizations
* Improved fp32 convolutions performance for real time inference on Intel(R) Xeon processors with Intel(R) AVX512 instruction set support
* Improved int8 depthwise separable convolutions performance on processors with Intel(R) AVX512 instruction set support
* Improved 3D convolution performance on Intel(R) Xeon Phi(TM) processors with AVX512_4FMAPS and AVX512_4VNNIW instruction groups support
* Optimized dilated convolutions for int8 and fp32 data types
* Improved performance of pooling primitives for NHWC and NCHW data layouts
* Improved performance of 3D pooling primitives for plain data layouts
* Optimized batch normalization backpropagation for Intel(R) processors with AVX and SSE4.2 instruction groups support
* Improved performance of batch normalization with 3D spatial data
New functionality
* Feature preview: Introduced training and inference support for GRU cells for recurrent neural network (RNN)
* Introduced general purpose SGEMM API
* Introduced deconvolution (or transposed convolution) primitive for 3D spatial data
* Introduced backward propagation for softmax primitive
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Tuomas Kärnä tkarna, msakai, Can Balioglu cbalioglu, Jacek Czaja jczaja, Thejan Wijesinghe ThejanW, Jesse Nicholson TechnikEmpire, okdshin, Crissman Loomis Crissman. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

0.14

Performance optimizations
* Improved fp32 Winograd convolution performance on Intel Xeon processors with Intel(R) AVX512 instruction set support.
* Improved depthwise separable convolutions performance on processors with Intel(R) SSE 4.2, Intel(R) AVX and Intel(R) AVX512 instruction sets support.
* Improved performance of GEMM-based convolutions backward propagation.
* Improved performance of auxiliary primitives for NHWC and NCHW data layouts.
New functionality
* Feature preview: Introduced recurrent neural network (RNN) support. This release includes training and inference support for uni- and bi-directional vanilla RNN and Long Short-Term Memory (LSTM) cells. Use of the new API is demonstrated with an example featuring LSTM model inference with attention based on Google Neural Machine Translation (GNMT) topology.
* Added Winograd convolution implementation for int8 data type optimized for Intel Xeon processors with Intel AVX512 instruction set support. The implementation includes initial optimizations for future Intel Xeon processors with AVX512_VNNI instruction groups support.
* Introduced deconvolution (or transposed convolution) primitive
* Introduced support for 3D spatial data in convolution and auxiliary primitives. The following primitives are optimized for 3D tensors:
* reorders
* convolution
* deconvolution
* batch normalization
* pooling
* eltwise
* concat
* inner product
Usability improvements
* Added flags -DWITH_TEST=OFF -DWITH_EXAMPLE=OFF in build system that disable building tests and examples.
* Added –DLIB_SUFFIX flag that allows to add suffix to the lib directory.
* Added prepare_mkl.bat script that automates download of Intel MKL small libraries on Windows.
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Zhong Cao **4pao**, Dmitriy Gorokhov, Jian Tang **tensor-tang**, Daniel M. Weeks **doctaweeks**, Tony Wang **tonywang1990**, Tao Lv **TaoLv** and Xinyu Chen **xinyu-intel**. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

0.13

Performance optimizations
* Added optimizations for future Intel(R) Xeon(R) processors with AVX512_VNNI instruction groups support. New instructions are used in direct convolutions with int8 and int16 data types.
* Improved performance of int8 direct forward convolution on Intel Xeon processors with Intel AVX512 instruction set.
* Improved performance of grouped convolutions and depthwise separable convolutions.
New functionality
* Extended Batch Normalization to enable fused ReLU on forward and backward propagation.
Usability improvements
* Improved profiling and debugging capabilities:
* New [verbose mode](http://intel.github.io/mkl-dnn/dev_guide_verbose.html) reports detailed information about each Intel MKL-DNN primitive call including primitive name, data layout, implementation and execution time.
* Instrumentation and tracing technology (ITT) enables [profiling of JIT code](http://intel.github.io/mkl-dnn/dev_guide_vtune.html) with Intel(R) Vtune(TM) Amplifier XE.
* JIT kernels can now be [saved for inspection](http://intel.github.io/mkl-dnn/dev_guide_inspecting_jit.html).
* Extended documentation with details on [int8 quantization](http://intel.github.io/mkl-dnn/dev_guide_inference_int8.html), [inference workflow](http://intel.github.io/mkl-dnn/dev_guide_inference.html), and [fusion](http://intel.github.io/mkl-dnn/cnn_inference_fp32_cpp.html).
* Added [int8 inference example](http://intel.github.io/mkl-dnn/cnn_inference_int8_cpp.html).
Thanks to the contributors
This release contains contributions from many Intel(R) Performance Libraries developers as well as Patric Zhao pengzhao-intel, Ashok Emani ashokei, Erik Kruus kruus and Dmitriy Gorokhov. We would also like to thank everyone who asked questions and reported issues.

*Other names and brands may be claimed as the property of others.

Page 25 of 27

Releases

Has known vulnerabilities

Previous Next

Onednn

Page 25 of 27

0.17

0.17rc

0.16

0.15

0.14

0.13

Page 25 of 27

Links

Releases