Exllamav2

Latest version: v0.1.5

Safety actively analyzes 639028 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 5

0.1.5

- Added Q6 and Q8 cache modes
- Defragment cache in dynamic generator
- Updated wheels to Torch 2.3.1
- Added Python 3.12 wheels, plus Python 3.9 for ROCm

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.4...v0.1.5

0.1.4

- Option to keep calibration states in VRAM while measuring
- Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
- Alternative `fasttensors` option on Windows to solve system memory issues
- Prefix filter with multiple prefixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.3...v0.1.4

0.1.3

- Fixes CFG

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.2...v0.1.3

0.1.2

- Support MiniCPM architecture
- Optimized prompt processing for page generator with Q4 cache
- New HumanEval and MMLU tests using dynamic generator
- Some bugfixes and small QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.1...v0.1.2

0.1.1

- Fix performance of Q4 cache in dynamic generator
- Add paged attn support for FP16 models
- Add xformers support

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.0...v0.1.1

0.1.0

- Paged attention support (requries flash-attn>=2.5.7)
- New generator with dynamic batching support (requires paged attn)
- Examples updated for dynamic generator
- Faster draft model SD
- Various optimizations, bugfixes and QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.21...v0.1.0

Page 1 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.