Exllamav2

Latest version: v0.2.4

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 6

0.2.4

- Support Pixtral
- Refactoring for more multimodal support
- Faster filter evaluation
- Various optimizations and bugfixes
- Various quality of life improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.3...v0.2.4

0.2.3

- No longer use safetensors for loading weights (fix virtual memory issues on Windows especially)
- Disable fasttensors option (now redundant)
- Prioritize HF Tokenizers model when both HF and SPM models available
- Add XTC sampler
- Add YaRN support
- Various fixes and QoL improvements

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.2...v0.2.3

0.2.2

- small fixes related to LMFE
- allow SDPA during normal inference with custom bias

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.1...v0.2.2

0.2.1

- TP: fallback SDPA mode when flash-attn is unavailable
- Faster filter/grammar path
- Add DRY
- Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
- Banish Râul

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.2.0...v0.2.1

0.2.0

Small release to fix various issues in 0.1.9

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.9...v0.2.0

0.1.9

- Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
- CUDA Graphs to reduce overhead and CPU bottlenecking
- Various other optimizations
- Some bugfixes

**Full Changelog**: https://github.com/turboderp/exllamav2/compare/v0.1.8...v0.1.9

Page 1 of 6

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.