Blockwise-parallel-transformer

Latest version: v0.3.2

Safety actively analyzes 722491 Python packages for vulnerabilities to keep your Python projects secure.

1.0.4

Fixed a bug in the feedforward module that caused a shape mismatch when using large block sizes.
Added a check for the block size to prevent shape mismatches when using large block sizes.
Updated the README file to include a troubleshooting section for shape mismatches and solutions.
Added a link to the PyTorch documentation for the Transformer Attention mechanism.

E0.1.3
e

1.0.3

Added support for CUDA acceleration by moving tensors to the GPU before computation.
Fixed a bug in the feedforward module that caused a shape mismatch when using small block sizes.
Updated the README file to include a section on how to use the module with CUDA.
Added a LICENSE file to specify the terms of use for the repository.

1.0.2

Fixed a bug in the padding logic that caused the module to pad the input sequence to the maximum sequence length even if the sequence was shorter.
Added a check for the input sequence length to prevent padding when unnecessary.
Updated the README file to include a troubleshooting section for common errors and solutions.
Added a CONTRIBUTING file to provide guidelines for contributing to the repository.

1.0.1

Fixed a bug in the feedforward module that caused a shape mismatch when multiplying matrices.
Reduced the default value of the hidden_size parameter to prevent shape mismatches when using small block sizes.
Added more detailed documentation and usage examples to the README file.
Shared the repository on social media and added links to the Agora Discord server.

1.0.0

Initial release of the BlockwiseParallelTransformerAttention class.
Implemented the forward pass of the module, which splits the input sequence into blocks and computes the attention output for each block in parallel.
Added support for variable-length input sequences by padding the input sequence to the maximum sequence length.
Added support for multi-head attention by computing the attention output for each head separately and concatenating the results.
Added a feedforward module after the attention module to apply a nonlinearity to the output.
Tested the module on a small dataset and verified that it produces the expected output.

Releases

Has known vulnerabilities