Flash-attention-softmax-n

Latest version: v0.3.2

Safety actively analyzes 623248 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 3

0.3.2

There was a [request](https://www.linkedin.com/feed/update/urn:li:activity:7105562277017202688?commentUrn=urn%3Ali%3Acomment%3A%28activity%3A7105562277017202688%2C7105777083288571904%29&dashCommentUrn=urn%3Ali%3Afsd_comment%3A%287105777083288571904%2Curn%3Ali%3Aactivity%3A7105562277017202688%29) on LinkedIn from Rami to give XLNet the "one-line treatment."

What that means is someone would be able to convert their XLNet model from softmax_0 to softmax_n by adding just one line of code to their script (or two lines if you count the import statement). For example,

python
import transformers

from flash_attention_softmax_n.surgery import apply_attention_softmax_n

model = transformers.AutoModel.from_pretrained('xlnet-base-cased')
apply_attention_softmax_n(model=model, softmax_n_param=1.)
...
On the backend, the _xlnet subpackage contains the modified rel_attn_core method code. It also adds the XLNetRelativeAttention class to the policy registry such that apply_attention_softmax_n and AttentionSoftmaxN know to operate on it.

As far as testing, the unit tests in tests/cpu/surgery/test_xlnet.py all pass. The tests in the cpu subpackage will run on a cpu or gpu. (The gpu subpackage tests are gpu-only.)

0.3.1

Enabled flash attention on Nvidia V100 GPUs.

0.3.0

Perform "surgery" on existing models. Take a pretrained model with softmax_0 in its attention mechanism and "operate" on it to replace softmax_0 with softmax_n. Based on MosaicML's [composer](https://github.com/mosaicml/composer).

Optionally install via:
bash
$ pip install flash-attention-softmax-n[surgery]

New Features:
- Functional API: add one line of code to your script, `flash_attention_n.surgery.apply_attention_softmax_n`.
- Object-oriented API for use with the MosaicML composer trainer, `flash_attention_n.surgery.AttentionSoftmaxN`.
- Use `flash_attention_n.surgery.surgery_functions. policy_registry` to register your model!
See the README for sample usage.

0.2.1

Previously, when no value for layers_to_save was passed to register_activation_hooks, it would register all named modules. This was problematic because not all modules output a tensor, and so an AttributeError would be thrown.

This update provides a default value for layers_to_save, and also uses a try, except block in save_activations_statistics.

Also, I've switched the output of register_activation_hooks from being a dictionary of lists to a dictionary of dictionaries such that all of the statistics are labeled now.

0.2.0

Created an analysis subpackage at `flash_attention_softmax_n.analysis`. See the README for an overview. The API for the core functions is unchanged.

Flash-attention-softmax-n

Page 1 of 3

0.3.2

0.3.1

0.3.0

0.2.1

0.2.0

0.1.3

Page 1 of 3

Links

Releases