Performance optimizations
* Introduced broad release quality optimizations for future Intel(R) Xeon(R) Scalable processor (code name Cooper Lake).
* Improved performance of matmul primitive for 3D tensors (batched matrix-matrix multiplication) on all supported processors.
* Improved performance of binary primitive for the case when one of the tensors have to be broadcasted on all supported processors.
* Improved performance of convolution primitive for 3D tensors and 1x1 kernel size on all supported processors.
New functionality
* Introduced fused depthwise convolution and convolution with 1x1 filter. The implementation is available for all supported processors and data types. The functionality is not implemented for Intel Processor Graphics.
* Introduced peephole support for [LSTM cell]( on all supported processors. The functionality is not implemented for Intel Processor Graphics.
* Implemented [matmul primitive]( for Intel Processors Graphics.
* Extended [binary primitive]() with min and max algorithms support.
* Extended [eltwise primitive](
* Introduced erf-based implementation of gelu algorithm
* Introduced pow algorithm
* Introduced backpropagation flavor relying on destination tensor as input for elu, exp, logistic, relu, sqrt, and tanh algorithms
* Extended set of operations for memory descriptors:
*Added support for changing the number of dimensions with existing [`dnnl::memory::desc::reshape()`]( method
* Introduced [`dnnl::memory::desc::permute_axes()`]( method to change logical axes order
Thanks to the contributors
This release contains contributions from the project core team as well as Araujo Mitrano, Arthur aaraujom, Aaron Mark Johnson aaronjohnson, Benjamin Hipple bhipple, Sergey Nesterov cepera, gaurav1086, Ilya Taraban itaraban, Mesut Meterelliyoz mmeterel, nSircombe, Peter Caday petercad, and Rafik Saliev rsaliev. We would also like to thank everyone who asked questions and reported issues.