**Overview**
AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .
**Key Features**
- **Wide Model Support**: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
- **Export Flexibility**: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
- **Device Compatibility**: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
- **Dataset Flexibility**: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.
**Examples**
- Explore language modeling and code generation examples to unlock the full potential of AutoRound.
**Additional Benefits**
- PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
- Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.
Known issues:
- baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon
Reference:
[1] https://github.com/intel/intel-extension-for-transformers
[2] https://github.com/AutoGPTQ/AutoGPTQ