Performance optimizations
* Improved int8 convolutions performance on processors with Intel® AVX512-DL Boost instruction set support.
* Improved performance of fp32 convolutions with number of input and output channels not divisible by the SIMD width for processors with Intel® AVX2 instruction set support.
* Improved performance of Recurrent Neural Networks (RNNs) functionality.
* Improved performance of int8 deconvolution.
* Added optimizations for fp32 inference and training for processors with Intel® AVX instruction set support.
* Added optimizations for convolutions and auxiliary primitives with 3D spatial data for processors with Intel® AVX2 instruction set support.
* Improved int8 Winograd convolution performance for real-time inference use cases.
New functionality
* Introduced int8 data-type support for inner-product primitive.
* Introduced support for int8 convolutions with signed input and signed weights.
* Introduced 1D spatial data support in convolution and auxiliary primitives. This functionality is optimized for processors with Intel® AVX512 instruction set support.
* Introduced the Shuffle primitive.
* Introduced a general-purpose matrix-matrix multiplication function for int8 data (gemm_s8u8s32 and gemm_s8s8s32).
* Feature preview: Threading Building Blocks (TBB) support.
API deprecations and breaking changes
* Order of the gates for LSTM cells was changed to input, forget, candidate, output. This might produce incorrect results.
* Backward RNN primitive creation without the hint in C++ is deprecated.
* Int8 Winograd convolution behavior with respect to scales is aligned with the direct convolution algorithm.
Usability improvements
* Primitives now accept tensors with 0 for the dimension and do nothing in that case.
* Added support for clang sanitizers.
* Build system extended with the following capabilities:
* Allow building with static Intel MKL by passing `-DMKLDNN_USE_MKL=FULL:STATIC` to cmake
* Allow specifying the Intel MKL to use by passing `-DMKLDNN_USE_MKL={DEF,NONE,ML,FULL}` to cmake for that
* Allow using the compiler's OpenMP RT by passing `-DMKLDNN_THREADING=OMP:COMP` to cmake for that
* Allow building a static library by passing `-DMKLDNN_LIBRARY_TYPE=STATIC` to cmake
Thanks to the contributors
This release contains contributions from many Intel Performance Libraries developers as well as Dmitry Baksheev dbakshee, Yuta Okamoto okapies, and Eduardo Gonzalez wmeddie. We would also like to thank everyone who asked questions and reported issues.
*Other names and brands may be claimed as the property of others.