Exllamav2

Latest version: v0.2.8

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

0.1.6

- Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
- Fix inference on ROCm wave64 devices
- Made model conversion script part of `exllamav2` package
- CPU optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.5...v0.1.6

0.1.5

- Added Q6 and Q8 cache modes
- Defragment cache in dynamic generator
- Use SDPA with Torch 2.3.0+
- Updated wheels to Torch 2.3.1
- Added Python 3.12 wheels, plus Python 3.9 for ROCm

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.4...v0.1.5

0.1.4

- Option to keep calibration states in VRAM while measuring
- Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
- Alternative `fasttensors` option on Windows to solve system memory issues
- Prefix filter with multiple prefixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.3...v0.1.4

0.1.3

- Fixes CFG

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.2...v0.1.3

0.1.2

- Support MiniCPM architecture
- Optimized prompt processing for page generator with Q4 cache
- New HumanEval and MMLU tests using dynamic generator
- Some bugfixes and small QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.1...v0.1.2

0.1.1

- Fix performance of Q4 cache in dynamic generator
- Add paged attn support for FP16 models
- Add xformers support

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.0...v0.1.1

Page 3 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.