What's New
- Auto Operator Optimization
Intel Extension for PyTorch will automatically optimize the operators of PyTorch when importing its python package. It will significantly improve the computation performance if the input tensor and the model is converted to the extension device.
- Auto Mixed Precision
Currently, the extension has supported bfloat16. It streamlines the work to enable a bfloat16 model. The feature is controlled by `enable_auto_mix_precision`. If you enable it, the extension will run the operator with bfloat16 automatically to accelerate the operator computation.
Performance Result
We collected the performance data of some models on the Intel Cooper Lake platform with 1 socket and 28 cores. Intel Cooper Lake introduced AVX512 BF16 instructions which could improve the bfloat16 computation significantly. The detail is as follows (The data is the speedup ratio and the baseline is upstream PyTorch).
|| Imperative - Operator Injection| Imperative - Mixed Precision |JIT- Operator Injection| JIT - Mixed Precision |
|--|--|--|--|--|
|RN50|2.68|5.01|5.14|9.66
|ResNet3D|3.00|4.67|5.19|8.39
|BERT-LARGE|0.99|1.40|N/A|N/A
We also measured the performance of ResNeXt101, Transformer-FB, DLRM, and YOLOv3 with the extension. We observed that the performance could be significantly improved by the extension as expected.
Known Issues
10 All data types have not been registered for DPCPP
37 MaxPool can't get nan result when input's value is nan
**NOTE**
The extension supported PyTorch v1.5.0-rc3. Support for other PyTorch versions is working in progress.