Linformer-pytorch

Latest version: v0.19.3

Safety actively analyzes 688600 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.8.0

Added the visualizer class, which lets you see all of the attention heads.

Also fixed a bug where calculated the E and F matrices. They were calculated to be `(n,d)`, but instead, they should have been `(n,k)`. This has since been fixed.

0.7.0

As well as updating the README, I updated the default behavior of the calculation of the inner head dimension. Now, instead of the default value having to be given, it works just like in the "attention is all you need" paper, where it takes however many channels there are, and divides the channels by the number of heads, and then that dimension goes into each of the attention heads.

0.6.0

Added both the RELU and GELU activation function options to the multihead attention block

0.5.0

Added the flag where one is able to reduce the value of `dim_k` by layer, with the `k_reduce_by_layer` flag. This was alluded to in Figure 1 of the paper, where the normalized cumulative eigenvalue index went up by layer, meaning that we can potentially get away with lower dimensions at higher depths.

0.4.0

Added the `none`, `headwise`, `kv`, and `layerwise` parameter sharing options. Also, added positional encodings

0.3.1

The way that the E and F matrices were calculated were changed. Before, they were an identity matrix, but with this release, they were changed to the way that the paper's authors recommended: As linear layers, with xavier init.

Page 3 of 3

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.