This release adds MultiHeadAttention, SelfAttention, and ResidualConnection modules to build transformer architectures.
Added
- MultiHeadAttention module composes multiple SelfAttention layers in parallel
- SelfAttention provides configurable query, key, and value transformations
- ResidualConnection wraps modules with residual and normalization
- Unit tests for new modules
This release is intended for testing.
Please file any issues found!
---