- Split settings for layer-norm bias, feedforward bias, and attention bias - In- and out-projection of token embeddings now share weights - Simplified data loader
0.1.3
- Bumped up the dependencies - Improved monitor gnuplot to include the gradient norm - Save chat history
0.1.2
- Docker image is now built and pushed to Docker Hub on release
0.0.5
- Added Dockerfile - Added choice between LayerNorm and RMSNorm