Auto-round

Latest version: v0.11

Safety actively analyzes 624661 Python packages for vulnerabilities to keep your Python projects secure.

0.1

**Overview**

AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .

**Key Features**

- **Wide Model Support**: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
- **Export Flexibility**: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
- **Device Compatibility**: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
- **Dataset Flexibility**: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.

**Examples**

- Explore language modeling and code generation examples to unlock the full potential of AutoRound.

**Additional Benefits**

- PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
- Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.

Known issues:

- baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon

Reference:

[1] https://github.com/intel/intel-extension-for-transformers

[2] https://github.com/AutoGPTQ/AutoGPTQ

Releases

Has known vulnerabilities

Auto-round

Page 1 of 1

0.1

Page 1 of 1

Links

Releases