Exllamav2

Latest version: v0.1.6

Safety actively analyzes 641171 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.0.12

Lots of fixes and tweaks. Main feature updates:

Model support:

- Basic LoRA support for MoE models
- Support for Orion models (also groundwork for other layernorm models)
- Support for loading/converting from Axolotl checkpoints

Generation/sampling:

- Fused kernels enabled for num_experts = 4
- Option to return probs from streaming generator
- Add top-A sampling
- Add freq/pres penalties
- CFG support in streaming generator
- Disable flash-attn for non-causal attention (fixes left-padding until FA2 implements custom bias)

Testing/evaluation:

- HumanEval test
- Script to compare two models layer by layer (e.g. quantized vs. original model)
- "Standard" ppl test that attempts to mimic text-generation-webui

Conversion:

- VRAM optimizations
- Optimized quantization kernels

IO:

- Cache safetensors context managers for faster loading
- Optional direct IO loader (for very fast arrays)

0.0.11

0.0.10

0.0.9

0.0.8

0.0.7

Page 4 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.