- Introduced `kv_dim` option to `StandardMultiheadAttention` (for different sizes of encoder and decoder) - Added the `CheckpointManager.get_model_checkpoint_path` method
0.2.0
- Introduced LLaMA and LLaMA 2 - Introduced Mistral 7B - Introduced LoRA fine-tuning - Revised the sequence generator API - Introduced support for lazy padding and attention masks - Many smaller improvements to existing APIs
0.1.1
- Improvements to the build system and CI pipelines - Improvements to the installation instructions and contribution guidelines