- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models
**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.18...v0.0.19