Fixed
- Updated triton dependency [facebookresearch/xformers418]
- Stripe lineinfo from binaries, reducing the binary size [facebookresearch/xformers549]
- Added support for pip wheels [facebookresearch/xformers588, facebookresearch/xformers573, facebookresearch/xformers534, facebookresearch/xformers523, ...] big thanks to [AbdBarho](https://github.com/AbdBarho)!
- Fixed compatibility with Python 3.7 [facebookresearch/xformers541] - thanks to [susumuota](https://github.com/susumuota)
- fMHA: Fixed strides for QKV gradients for cutlass attention [facebookresearch/xformers535]
- fMHA: Stricter inputs validation to avoid CUDA errors for unsupported inputs [facebookresearch/xformers592]
- fMHA/Flash-Attention: Updated to https://github.com/HazyResearch/flash-attention/commit/a1f49a2b92b6fa022379bbebafed9d7f5e96a675 with multiple changes from [TriDao](https://github.com/tridao) that make the operator up to 20% faster
- fMHA/Flash-Attention: Fixed backward pass wrapper, where non-contiguous gradients could give the wrong result [facebookresearch/xformers548]
- fMHA: Separate each operator into forward and backward operators. It's now possible to use any combination of forward+backward (for instance Triton forward and Flash-Attention backward) [facebookresearch/xformers560]
Added
- fMHA: Added Triton operator for forward pass from [Flash-Attention](https://github.com/HazyResearch/flash-attention/blob/main/flash_attn/flash_attn_triton.py) authored by [TriDao](https://github.com/tridao), will be automatically used on A100 when compatible
- fMHA: Added [`xformers.ops.memory_efficient_attention_forward`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention_forward), [`xformers.ops.memory_efficient_attention_forward_requires_grad`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention_forward_requires_grad), [`xformers.ops.memory_efficient_attention_backward`](https://facebookresearch.github.io/xformers/components/ops.html#xformers.ops.memory_efficient_attention_backward) for power-users who write custom autograd functions [facebookresearch/xformers560]
- fMHA: Support for custom scaling for the CUTLASS-based kernel [facebookresearch/xformers530] - contribution from [comaniac](https://github.com/comaniac)