New Features:
- Support additive, multiplicative and dot attention types in MultiHeadAttention module.
- Support different time step sizes for query and key in case of multiplicative and additive attention type (MultiHeadAttention).
Refactoring:
- Change default initialisation of MLP layers to nn.Linear defaults.