Have not pushed up a release in a while, and this is a latest working version after 2 misc bugs have been fixed.
0.16.0
Added intermediate ff dimension
Now, the model dimension can be different in the intermediate layers. This change applies to the ff module, and only in the encoder. Now, if the flag `ff_intermediate` is not None, the layers will look like this:
Now, the linformer supports convolution as a way to downsample the input, instead of relying on linear layers. This may reduce the amount of parameters necessary.
0.14.0
Finished an encoder and a decoder module. Also, causal attention works, when the `causal=True` flag is set. Will update the README shortly...
0.13.1
Added masking to the Linformer. However, this is still a WIP, since masking cannot be done in the traditional sense, like what is done in the attention is all you need paper, because there is an overhead of adding another `(n,n)` matrix, which is infeasable.