Transformer-engine

Latest version: v2.1.0

Safety actively analyzes 723938 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 4

1.2.1

Fixed Issues
- Statistics could be computed incorrectly when training with FP8 in recent versions of pyTorch. For details see https://github.com/NVIDIA/TransformerEngine/issues/600.

1.2

1.2.0

Key Features and Enhancements

- [pyTorch] Sliding window support is added for DotProductAttention.
- [pyTorch] Performance of DotProductAttention is increased on Hopper GPUs by utilizing cuDNN.
- [pyTorch] Support for the Falcon architecture is added in TransformerLayer via the new option `parallel_attention_mlp`.
- [pyTorch] Checkpointing logic when using `fp8_model_init` is improved.
- [JAX] Support is added for controlling SM margin in LayerNorm and RMSNorm kernel via environment variables `NVTE_FWD_LAYERNORM_SM_MARGIN` and `NVTE_BWD_LAYERNORM_SM_MARGIN`.

Fixed Issues

- Weight gradient could be computed incorrectly in some cases when FP8 execution and sequence parallelism were used together.
- Statistics were computed incorrectly during FP8 calibration.
- Using torch.compile on DotProductAttention module caused a crash.
- Rotary embeddings during pipeline-parallel inference did not operate correctly.
- Incorrect mask type used by the decoder in encoder-decoder architectures.
- Exporting Transformer Engine modules to ONNX in recent versions of pyTorch did not work correctly.

Known Issues in This Release

- FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). You can work around this issue either by setting the environment variable `MAX_JOBS=1` during Transformer Engine installation, or by installing FlashAttention v1 (e.g. by running `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine.
- [pyTorch] FlashAttention v2.1 changed the behavior of the causal mask when performing cross-attention. (See https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag for reference.) To keep Transformer Engine behavior consistent between versions and backends, FlashAttention is disabled for this use case (cross attention with casual masking) when 2.1+ version of FlashAttention is installed.

Breaking Changes in This Release
There are no breaking changes in this release.

Deprecated Features
There are no deprecated features in this release.

1.1

1.1.0

Key Features and Enhancements

* [pyTorch] Memory usage is reduced when using the `fp8_model_init` API during inference.
* [pyTorch] Memory usage is reduced when using the `LayerNormLinear`, `LayernormMLP`, and `TransformerLayer` APIs.
* [JAX] Transformer Engine is migrated to the new Custom Partitioning mechanism of parallelism for custom ops in JAX.
* [JAX] The attention operation’s performance is improved when using cuDNN version 8.9.6 or greater.
* [C/C++] Transformer Engine can now be built as a subproject.

Fixed Issues

* Fixed an issue where in some cases passing the non-contiguous tensors as Q, K, or V to `DotProductAttention` would result in an error, “Exception: The provided qkv memory layout is not supported!.”

Known Issues in This Release

* FlashAttention v2, which is a dependency of this release of Transformer Engine, has a known issue with excessive memory usage during installation (https://github.com/Dao-AILab/flash-attention/issues/358). One could workaround this issue by either setting the MAX_JOBS=1 environment variable during Transformer Engine installation or installing FlashAttention v1 (e.g. by `pip install flash-attn==1.0.9`) before attempting to install Transformer Engine.
* [pyTorch] FlashAttention v2.1 has changed the behavior of the causal mask when performing cross-attention (see https://github.com/Dao-AILab/flash-attention#21-change-behavior-of-causal-flag for reference). For Transformer Engine to preserve consistent behavior between versions and back ends, FlashAttention is disabled for this use case (i.e. cross-attention with casual masking) when FlashAttention version 2.1+ is installed.

Breaking Changes in This Release

There are no breaking changes in this release.

Deprecated Features

There are no deprecated features in this release.

1.0 Page 3 of 4

Releases

Has known vulnerabilities

Previous Next

Transformer-engine

Page 3 of 4

1.2.1

1.2

1.2.0

1.1

1.1.0

1.0

Page 3 of 4

Links

Releases