Exllamav2

Latest version: v0.2.8

Safety actively analyzes 723518 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

0.1.0

- Paged attention support (requries flash-attn>=2.5.7)
- New generator with dynamic batching support (requires paged attn)
- Examples updated for dynamic generator
- Faster draft model SD
- Various optimizations, bugfixes and QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.21...v0.1.0

0.0.21

- Support for Granite architecture
- Support for GPT2 architecture
- Support for banned strings in streaming generator
- A bit more work on multimodal support (still unfinished)
- Few bugfixes and stuff
- Windows wheels for PyTorch 2.2.0 are included below to work around an apparent (likely temporary) issue in PyTorch. See 434 and https://github.com/pytorch/pytorch/issues/125109

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.20...v0.0.21

0.0.20

- Adds Phi3 support
- Wheels compiled for PyTorch 2.3.0
- ROCm 6.0 wheels

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.19...v0.0.20

0.0.19

- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.18...v0.0.19

0.0.18

- Support for Command-R-plus
- Fix for pre-AVX2 CPUs
- VRAM optimizations for quantization
- Very preliminary multimodal support
- Various other small fixes and optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.17...v0.0.18

0.0.17

Mostly just minor fixes and support for DBRX models.

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.16...v0.0.17

Page 4 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.