Exllamav2

Latest version: v0.0.21

Safety actively analyzes 629639 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 4

0.0.21

- Support for Granite architecture
- Support for GPT2 architecture
- Support for banned strings in streaming generator
- A bit more work on multimodal support (still unfinished)
- Few bugfixes and stuff

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.20...v0.0.21

0.0.20

- Adds Phi3 support
- Wheels compiled for PyTorch 2.3.0
- ROCm 6.0 wheels

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.19...v0.0.20

0.0.19

- More accurate Q4 cache using groupwise rotations
- Better prompt ingestion speed when using flash-attn
- Minor fixes related to issues quantizing Llama 3
- New, more robust optimizer
- Fix bug on long-sequence inference for GPTQ models

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.18...v0.0.19

0.0.18

- Support for Command-R-plus
- Fix for pre-AVX2 CPUs
- VRAM optimizations for quantization
- Very preliminary multimodal support
- Various other small fixes and optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.17...v0.0.18

0.0.17

Mostly just minor fixes and support for DBRX models.

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.16...v0.0.17

0.0.16

- Adds support for Cohere models
- N-gram decoding
- A few bugfixes
- Lots of optimizations

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.0.15...v0.0.16

Page 1 of 4

Releases

Has known vulnerabilities

Exllamav2

Page 1 of 4

0.0.21

0.0.20

0.0.19

0.0.18

0.0.17

0.0.16

Page 1 of 4

Links

Releases