Added
- Added `layers.SSRU`, which implements a Simpler Simple Recurrent Unit as described in
Kim et al, "From Research to Production and Back: Ludicrously Fast Neural Machine Translation" WNGT 2019.
- Added `ssru_transformer` option to `--decoder`, which enables the usage of SSRUs as a replacement for the decoder-side self-attention layers.
Changed
- Reduced the number of arguments for `MultiHeadSelfAttention.hybrid_forward()`.
`previous_keys` and `previous_values` should now be input together as `previous_states`, a list containing two symbols.