Exllamav2

Latest version: v0.0.20

Safety actively analyzes 623694 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.0.20

- Adds Phi3 support
- Wheels compiled for PyTorch 2.3.0
- ROCm 6.0 wheels

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.19...v0.0.20

0.0.19

- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.18...v0.0.19

0.0.18

- Support for Command-R-plus
- Fix for pre-AVX2 CPUs
- VRAM optimizations for quantization
- Very preliminary multimodal support
- Various other small fixes and optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.17...v0.0.18

0.0.17

Mostly just minor fixes and support for DBRX models.

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.16...v0.0.17

0.0.16

- Adds support for Cohere models
- N-gram decoding
- A few bugfixes
- Lots of optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.15...v0.0.16

0.0.15

- Adds Q4 cache mode
- Support for StarCoder2
- Minor optimizations and a couple of bugfixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.14...v0.0.15

Page 1 of 4

Releases

Has known vulnerabilities

Exllamav2

Page 1 of 4

0.0.20

0.0.19

0.0.18

0.0.17

0.0.16

0.0.15

Page 1 of 4

Links

Releases