Exllamav2

Latest version: v0.2.4

Safety actively analyzes 682487 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 6

0.1.8

- Support Llama 3.1 (correct RoPE scaling etc.)
- Support IndexTeam architecture
- Some bugfixes and QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.7...v0.1.8

0.1.7

- Support Gemma2
- Support InternLM2
- Various bugfixes and optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.6...v0.1.7

0.1.6

- Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
- Fix inference on ROCm wave64 devices
- Made model conversion script part of `exllamav2` package
- CPU optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.5...v0.1.6

0.1.5

- Added Q6 and Q8 cache modes
- Defragment cache in dynamic generator
- Use SDPA with Torch 2.3.0+
- Updated wheels to Torch 2.3.1
- Added Python 3.12 wheels, plus Python 3.9 for ROCm

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.4...v0.1.5

0.1.4

- Option to keep calibration states in VRAM while measuring
- Fix for Q4 cache for odd key/value sizes (MiniCPM specifically)
- Alternative `fasttensors` option on Windows to solve system memory issues
- Prefix filter with multiple prefixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.3...v0.1.4

0.1.3

- Fixes CFG

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.2...v0.1.3

Page 2 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.