What's Changed
* Update README.md by eltociear in https://github.com/fla-org/flash-linear-attention/pull/2
* fix simple gla backward by sunyt32 in https://github.com/fla-org/flash-linear-attention/pull/6
* Adding RWKV-v4. by ridgerchu in https://github.com/fla-org/flash-linear-attention/pull/8
* fixed hgrn.py paper link and title by ridgerchu in https://github.com/fla-org/flash-linear-attention/pull/10
* Update recurrent_naive.py by hypnopump in https://github.com/fla-org/flash-linear-attention/pull/12
* fix: calculate du on different batch by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/35
* fix: enhance state gradient when bf16 by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/37
* Add implementations of Mamba 2 into FLA by DanFosing in https://github.com/fla-org/flash-linear-attention/pull/39
* Minor mamba-2 fixes by DanFosing in https://github.com/fla-org/flash-linear-attention/pull/40
* [DeltaNet] Adds beta as a vector option by hypnopump in https://github.com/fla-org/flash-linear-attention/pull/42
* [DRAFT] Beta gradient does not match by hypnopump in https://github.com/fla-org/flash-linear-attention/pull/43
* [Attn] fix negative value of seqlen offset during sft by ChaosCodes in https://github.com/fla-org/flash-linear-attention/pull/45
* [RWKV6] fix backward if h0 not passed by hypnopump in https://github.com/fla-org/flash-linear-attention/pull/48
* Replace mamba2 `mamba_chunk_scan_combined` triton kernel by `simple_gla` triton kernel by learning-chip in https://github.com/fla-org/flash-linear-attention/pull/49
* benchmark script for simple_gla vs mamba2 kernel by learning-chip in https://github.com/fla-org/flash-linear-attention/pull/50
* Update amp custom_fwd, custom_bwd usage for torch 2.4.0 compatibility by mirceamironenco in https://github.com/fla-org/flash-linear-attention/pull/54
* Fix syntax error by JulienSiems in https://github.com/fla-org/flash-linear-attention/pull/55
* Add `__init__.py` in `fla/ops/common` for automatic package discovery by zhixuan-lin in https://github.com/fla-org/flash-linear-attention/pull/56
* [`Mamba2`] Post Merge Fixes - `norm_before_gate` and generation with `inputs_embeds` by vasqu in https://github.com/fla-org/flash-linear-attention/pull/57
* Correctly compute `max_seqlen` when `max_position_embeddings` is `None` by zhixuan-lin in https://github.com/fla-org/flash-linear-attention/pull/59
* add chunked kl div by ChaosCodes in https://github.com/fla-org/flash-linear-attention/pull/62
* Add fine-grained warning category for easier supression by mirceamironenco in https://github.com/fla-org/flash-linear-attention/pull/65
* Update fused_chunk.py by hypnopump in https://github.com/fla-org/flash-linear-attention/pull/72
* [`Mamba2`] Fix slow path by vasqu in https://github.com/fla-org/flash-linear-attention/pull/84
* Add BitNet by DustinWang1 in https://github.com/fla-org/flash-linear-attention/pull/85
* Fix RWKV6 Cache Problems by WorldEditors in https://github.com/fla-org/flash-linear-attention/pull/78
* Bugs in RWKV6 OP by WorldEditors in https://github.com/fla-org/flash-linear-attention/pull/87
* fix mamba2 cache bug by WorldEditors in https://github.com/fla-org/flash-linear-attention/pull/89
* fix dh0 is None breaking backward pass by Sxela in https://github.com/fla-org/flash-linear-attention/pull/102
* support varlen training for conv1d by LKJacky in https://github.com/fla-org/flash-linear-attention/pull/116
* blood for the torch.compile gods by harrisonvanderbyl in https://github.com/fla-org/flash-linear-attention/pull/119
* Added forward pass for chunckwise ttt-linear, varlen is supported. by Pan-Yuqi in https://github.com/fla-org/flash-linear-attention/pull/124
* Add scripts for converting pretrained RWKV7 models to fla format by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/128
* [`Mamba2`] Fixes for caching and multiple other small issues by vasqu in https://github.com/fla-org/flash-linear-attention/pull/129
* [LinAttn] Fix handling of None scale in chunk_linear_attn for output normalization by HallerPatrick in https://github.com/fla-org/flash-linear-attention/pull/130
* Fix incorrect kwarg name in `fused_recurrent` by fffffgggg54 in https://github.com/fla-org/flash-linear-attention/pull/134
* RWKV-7 conversion and evals by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/135
* Fixed dtype mismatch of mamba & mamba2 under residual_in_fp32 setting by chengshuang18 in https://github.com/fla-org/flash-linear-attention/pull/137
* [RWKV7] Fix masking before time shifting modules by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/141
* [RWKV7, but applicable to all models] Update modeling_rwkv7.py: Fixing `base_model_prefix` by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/143
* fix bitattn with latest attn implementation. by ridgerchu in https://github.com/fla-org/flash-linear-attention/pull/146
* [RWKV7] Remove in-place operations and add gradient checkpointing for `v_first` by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/145
* [BitNet] Fix bugs of model definitions by ridgerchu in https://github.com/fla-org/flash-linear-attention/pull/147
* Fix 157 by jannalulu in https://github.com/fla-org/flash-linear-attention/pull/167
* [Mamba, Samba] Add weight initializations and reset_parameters() in _init_weights() for compatibility in Flame by zaydzuhri in https://github.com/fla-org/flash-linear-attention/pull/169
* fix lint errors by jannalulu in https://github.com/fla-org/flash-linear-attention/pull/170
* [RWKV] Follow-up to fix cache management by jannalulu in https://github.com/fla-org/flash-linear-attention/pull/168
* one liner by seanxwzhang in https://github.com/fla-org/flash-linear-attention/pull/178
* [Attn] Fix cache update of swa by Pan-Yuqi in https://github.com/fla-org/flash-linear-attention/pull/183
* [RWKV7] Fix conversion precision by Triang-jyed-driung in https://github.com/fla-org/flash-linear-attention/pull/188
* [GRPO]: add grpo functions by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/189
* [RWKV] fix logits handling by jannalulu in https://github.com/fla-org/flash-linear-attention/pull/192
* [Modules]: Enhance the precision of the fused LayerNorm OP. by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/200
* [MISC] fix delta_net logit handling by jannalulu in https://github.com/fla-org/flash-linear-attention/pull/205
* [RWKV7] Keep compatibility with Torch Compiler. by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/208
* [Misc.] Update wrapper to support contiguous and guard custom device … by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/212
* [Models] Fix the error in the judgment of past_key_values when inputs… by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/213
* [Titans] Update Titans implementation by rucnyz in https://github.com/fla-org/flash-linear-attention/pull/214
* [Mamba2] Fix initialization by HanGuo97 in https://github.com/fla-org/flash-linear-attention/pull/225
* [TTT] Update fused chunk ops and state bias term by Pan-Yuqi in https://github.com/fla-org/flash-linear-attention/pull/230
* Enable utils.py to be imported on CPU-only machines (231) by zhuzeyuan in https://github.com/fla-org/flash-linear-attention/pull/232
* [Utils] use fla.utils.device instead of cuda by uniartisan in https://github.com/fla-org/flash-linear-attention/pull/163
* fix(GatedDeltaNet): Ensure integer dimensions when using `expand_v` by vladislavalerievich in https://github.com/fla-org/flash-linear-attention/pull/234
New Contributors
* eltociear made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/2
* sunyt32 made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/6
* ridgerchu made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/8
* hypnopump made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/12
* DanFosing made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/39
* ChaosCodes made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/45
* learning-chip made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/49
* mirceamironenco made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/54
* JulienSiems made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/55
* zhixuan-lin made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/56
* vasqu made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/57
* DustinWang1 made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/85
* WorldEditors made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/78
* Sxela made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/102
* LKJacky made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/116
* harrisonvanderbyl made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/119
* Triang-jyed-driung made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/128
* HallerPatrick made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/130
* fffffgggg54 made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/134
* chengshuang18 made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/137
* jannalulu made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/167
* zaydzuhri made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/169
* seanxwzhang made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/178
* zhuzeyuan made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/232
* vladislavalerievich made their first contribution in https://github.com/fla-org/flash-linear-attention/pull/234
**Full Changelog**: https://github.com/fla-org/flash-linear-attention/commits/v0.1.0