Exllamav2

Latest version: v0.2.8

Safety actively analyzes 723518 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 7

0.2.2

- small fixes related to LMFE
- allow SDPA during normal inference with custom bias

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.1...v0.2.2

0.2.1

- TP: fallback SDPA mode when flash-attn is unavailable
- Faster filter/grammar path
- Add DRY
- Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
- Banish Râul

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.0...v0.2.1

0.2.0

Small release to fix various issues in 0.1.9

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.9...v0.2.0

0.1.9

- Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
- CUDA Graphs to reduce overhead and CPU bottlenecking
- Various other optimizations
- Some bugfixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.8...v0.1.9

0.1.8

- Support Llama 3.1 (correct RoPE scaling etc.)
- Support IndexTeam architecture
- Some bugfixes and QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.7...v0.1.8

0.1.7

- Support Gemma2
- Support InternLM2
- Various bugfixes and optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.6...v0.1.7

Page 2 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.