Key Features and Enhancements
- [pyTorch] Added a new argument, `softmax_scale`, to the `DotProductAttention` API.
- [pyTorch] Extended Transformer Engine’s pyTorch build to always compile with tensor parallelism (TP) communication overlap support, and to remove MPI dependency. Also exposed the APIs `initialize_ub` and `destroy_ub` for communication-gemm overlap configuration.
- [pyTorch] Improved documentation for the `DotProductAttention` API, including benchmarks and end-to-end test scripts.
- [pyTorch] Incorporated the Fused Adam and Fused SGD optimizers into Transformer Engine. They previously had to be installed from the GitHub repository https://github.com/NVIDIA/apex.
Fixed Issues
- [pyTorch] Made internal changes to reduce the amount of CPU overhead.
- [pyTorch] Fixed a crash that occured when using TorchDynamo with the `checkpoint` API.
- [pyTorch] Fixed an issue with loading an FP8 checkpoint when using FP8 attention.
Known Issues in This Release
There are no known issues in this release.
Breaking Changes in This Release
There are no breaking changes in this release.
Deprecated Features
There are no deprecated features in this release.