Xformers

Latest version: v0.0.28.post3

Safety actively analyzes 688215 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

2.3.0

Added
- [2:4 sparsity] Added support for Straight-Through Estimator for `sparsify24` gradient (`GRADIENT_STE`)
- [2:4 sparsity] `sparsify24_like` now supports the cuSparseLt backend, and the STE gradient
- Basic support for `torch.compile` for the `memory_efficient_attention` operator. Currently only supports Flash-Attention, and without any bias provided. We want to expand this coverage progressively.
Improved
- merge_attentions no longer needs inputs to be stacked.
- fMHA: triton_splitk now supports additive bias
- fMHA: benchmark cleanup

2.2.2

2.2.1

Added
- New `merge_attentions` function
- fMHA: New gappy attention biases.
Improved
- fMHA: Updated Flash-Attention to v2.5.6: this has a performance improvement for multiquery.
- fMHA: triton_splitk changed and expanded. Now amalgamates using LSE. Can autotune, supports causal with a small number of queries - not just 1. Experimental support for paged attention.
- `rope_padded`: Fixed CUDA error with many queries (more than 65k)
- `rmsnorm`: Fixed CUDA error with large inputs (enables 512k+ sequence length on Llama2 70B)
Removed
- fMHA: Removed triton operator (`fmha.triton.*`, `xformers.ops.MemoryEfficientAttentionTritonFwdFlashBwOp`, `xformers.ops.TritonFlashAttentionOp`), as it has correctness issues under some conditions, and is slower than other implementations.

2.2.0

Added
- Added components for model/sequence parallelism, as near-drop-in replacements for FairScale/Megatron Column&RowParallelLinear modules. They support fusing communication and computation for sequence parallelism, thus making the communication effectively free. [Read more](https://twitter.com/d_haziza/status/1753030654118211593)
- Added kernels for training models with 2:4-sparsity. We introduced a very fast kernel for converting a matrix A into 24-sparse format, which can be used during training to sparsify weights dynamically, activations etc... xFormers also provides an API that is compatible with torch-compile, see `xformers.ops.sparsify24`.
Improved
- Make selective activation checkpointing be compatible with torch.compile.
Removed
- Triton kernels now require a GPU with compute capability 8.0 at least (A100 or newer). This is due to newer versions of triton not supporting older GPUs correctly
- Removed support for PyTorch version older than 2.1.0

0.0.28.post3

0.0.28.post2

Page 2 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.