Auto-round

Latest version: v0.11

Safety actively analyzes 688792 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.4.1

Highlights:
- Fixed vllm calibration infinite loop issue
- Corrected the default value for the sym argument in the API configuration.

What's Changed
* fix typo by wenhuach21 in https://github.com/intel/auto-round/pull/342
* vllm/llama-vision llava calibration infinite loop fix by WeiweiZhang1 in https://github.com/intel/auto-round/pull/343
* [HPU]Enhance `numba` check by yiliu30 in https://github.com/intel/auto-round/pull/345
* [VLM]fix bs and grad reset by n1ck-guo in https://github.com/intel/auto-round/pull/344
* [HPU]Enhance installation check by yiliu30 in https://github.com/intel/auto-round/pull/346
* [Critical Bug]API use sym as default by wenhuach21 in https://github.com/intel/auto-round/pull/349
* triton backend requires< 3.0 by wenhuach21 in https://github.com/intel/auto-round/pull/348


**Full Changelog**: https://github.com/intel/auto-round/compare/v0.4...v0.4.1

0.4

Highlights
[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align `gradient_accumulate_steps` behavior for varied length input

What's Changed
* refine AuoRound format and support marlin repacking by wenhuach21 in https://github.com/intel/auto-round/pull/280
* update readme for v0.3.1 release by wenhuach21 in https://github.com/intel/auto-round/pull/283
* update readme for cpu inference by wenhuach21 in https://github.com/intel/auto-round/pull/284
* avoid deterministic algorithm warning in inference by wenhuach21 in https://github.com/intel/auto-round/pull/285
* fix mx_fp issues by wenhuach21 in https://github.com/intel/auto-round/pull/286
* update torch ao integration information by wenhuach21 in https://github.com/intel/auto-round/pull/287
* Refine code by wenhuach21 in https://github.com/intel/auto-round/pull/291
* Add ipex support for intel cpu by wenhuach21 in https://github.com/intel/auto-round/pull/292
* fix ipex tqdm mismatch issue by wenhuach21 in https://github.com/intel/auto-round/pull/293
* fix bug of backend by wenhuach21 in https://github.com/intel/auto-round/pull/294
* [Experimental Feature]support for common hf multimodel by n1ck-guo in https://github.com/intel/auto-round/pull/276
* use torch.compile by default for PyTorch versions 2.6 and above by wenhuach21 in https://github.com/intel/auto-round/pull/295
* refine forward hook by WeiweiZhang1 in https://github.com/intel/auto-round/pull/290
* eval for MLLMs by n1ck-guo in https://github.com/intel/auto-round/pull/296
* mllm eval bug fix by n1ck-guo in https://github.com/intel/auto-round/pull/297
* Port Numba-based packing from INC by yiliu30 in https://github.com/intel/auto-round/pull/301
* refine model config file for mixed precision quantization by wenhuach21 in https://github.com/intel/auto-round/pull/300
* fix glm4-9b batch dim issue by wenhuach21 in https://github.com/intel/auto-round/pull/304
* better align gradient_accumulate_steps for varied length input by wenhuach21 in https://github.com/intel/auto-round/pull/309
* Enable torch.compile on HPU by yiliu30 in https://github.com/intel/auto-round/pull/307
* Update autogptq exporting by wenhuach21 in https://github.com/intel/auto-round/pull/310
* fix typo by wenhuach21 in https://github.com/intel/auto-round/pull/311
* qwen2 vision quantization bugfix by WeiweiZhang1 in https://github.com/intel/auto-round/pull/313
* multiple gpu evaluation/calibration refine by wenhuach21 in https://github.com/intel/auto-round/pull/312
* HPU only release binary by yiliu30 in https://github.com/intel/auto-round/pull/302
* patch 1 for mllm by n1ck-guo in https://github.com/intel/auto-round/pull/298
* add torch compile arg by wenhuach21 in https://github.com/intel/auto-round/pull/314
* fix merge error by n1ck-guo in https://github.com/intel/auto-round/pull/316
* Update the check for HPU by yiliu30 in https://github.com/intel/auto-round/pull/318
* fix eval device issue by wenhuach21 in https://github.com/intel/auto-round/pull/319
* fix multiple device bug by wenhuach21 in https://github.com/intel/auto-round/pull/321
* add warning for no gptq exllamav2 kernel by wenhuach21 in https://github.com/intel/auto-round/pull/324
* add pile calib, rename quant_block_list to to_quant_block_names by WeiweiZhang1 in https://github.com/intel/auto-round/pull/322
* fix autogptq version error by wenhuach21 in https://github.com/intel/auto-round/pull/325
* new mllm eval by n1ck-guo in https://github.com/intel/auto-round/pull/317
* Add cpu only version by XuehaoSun in https://github.com/intel/auto-round/pull/315
* set default mllm dataset by n1ck-guo in https://github.com/intel/auto-round/pull/327
* fix fp_layers issue and force to FP16 on cuda for autoround format inference by wenhuach21 in https://github.com/intel/auto-round/pull/326
* fix the bug of test model support for test-only by n1ck-guo in https://github.com/intel/auto-round/pull/328
* Increase unit test timeout to 120 minutes by XuehaoSun in https://github.com/intel/auto-round/pull/330
* fix mllm dataset config bug and add gptq cuda backend by wenhuach21 in https://github.com/intel/auto-round/pull/329
* add tips and tricks for llm&mllm quantization by wenhuach21 in https://github.com/intel/auto-round/pull/333
* fix eval_bs in fake format and reset auto-gptq exporting max_shard_size by wenhuach21 in https://github.com/intel/auto-round/pull/332
* fix model_dtype issue and reformat mllm code by wenhuach21 in https://github.com/intel/auto-round/pull/335
* Exclude markdown files from unit test pipelines by XuehaoSun in https://github.com/intel/auto-round/pull/337
* refine mllm docs by WeiweiZhang1 in https://github.com/intel/auto-round/pull/336
* cogvlm doc by n1ck-guo in https://github.com/intel/auto-round/pull/339
* add qwen2.5 recipe and refine readme by WeiweiZhang1 in https://github.com/intel/auto-round/pull/338
* add cogvlm recipe and refine readme by WeiweiZhang1 in https://github.com/intel/auto-round/pull/340
* refine mllm API and add help info by n1ck-guo in https://github.com/intel/auto-round/pull/334


**Full Changelog**: https://github.com/intel/auto-round/compare/v0.3.1...v0.4

0.3.1

Release Highlights:

New Features:

**Full-Range Symmetric Quantization**: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.

**Command-Line Support**: You can now quantize models using the command `auto-round --model xxx --format xxx`

Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.

Muiti-thread packing: up to 2X speed up on packing phase

Bug Fixes:

Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.

Mutable Default Values Issue: Addressed problems related to mutable default values.

3 bit packing bug for AutoGPTQ format

What's Changed
* Add setseed in autoround by WeiweiZhang1 in https://github.com/intel/auto-round/pull/201
* support autoawq format by yintong-lu in https://github.com/intel/auto-round/pull/115
* Remove UT coverage check by XuehaoSun in https://github.com/intel/auto-round/pull/202
* set autoround format as default to unify CPU/HPU/CUDA by wenhuach21 in https://github.com/intel/auto-round/pull/205
* add local file of pile-10k by WeiweiZhang1 in https://github.com/intel/auto-round/pull/198
* modify setup.py by n1ck-guo in https://github.com/intel/auto-round/pull/206
* limit the scale minimum value not to 0 by WeiweiZhang1 in https://github.com/intel/auto-round/pull/211
* fix example dataset regression by WeiweiZhang1 in https://github.com/intel/auto-round/pull/212
* remove local pile file by WeiweiZhang1 in https://github.com/intel/auto-round/pull/213
* update xpu format exporting by WeiweiZhang1 in https://github.com/intel/auto-round/pull/214
* fix a bug in autoround format inference by wenhuach21 in https://github.com/intel/auto-round/pull/215
* avoid underflow and overflow for exllamav2 by wenhuach21 in https://github.com/intel/auto-round/pull/218
* add qwen int4 model, refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/217
* [Experimental Feature]fast tuning norm/bias at 2 bits by wenhuach21 in https://github.com/intel/auto-round/pull/208
* update readme by wenhuach21 in https://github.com/intel/auto-round/pull/220
* refine eval_042 to enable parallelize evaluation by WeiweiZhang1 in https://github.com/intel/auto-round/pull/221
* Enable phi3v tuning by WeiweiZhang1 in https://github.com/intel/auto-round/pull/197
* Bump setuptools from 69.5.1 to 70.0.0 in /examples/multimodal-modeling/Phi-3-vision by dependabot in https://github.com/intel/auto-round/pull/223
* refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/224
* change the scale thresh generally by WeiweiZhang1 in https://github.com/intel/auto-round/pull/229
* add quantized models by 3rd party by WeiweiZhang1 in https://github.com/intel/auto-round/pull/230
* add meta3.1-70B-instruct model, refine docs by WeiweiZhang1 in https://github.com/intel/auto-round/pull/231
* fix model link by WeiweiZhang1 in https://github.com/intel/auto-round/pull/232
* refine docs, add accuracy data, add receip and eval scripts by WeiweiZhang1 in https://github.com/intel/auto-round/pull/226
* add brief formats introduction by wenhuach21 in https://github.com/intel/auto-round/pull/236
* update readme and add itrex in the requirements.txt by wenhuach21 in https://github.com/intel/auto-round/pull/238
* add tritonv2, improve packing and pbar by wenhuach21 in https://github.com/intel/auto-round/pull/239
* refine the code and the speedup is notable by wenhuach21 in https://github.com/intel/auto-round/pull/240
* move some settings from example to main by wenhuach21 in https://github.com/intel/auto-round/pull/241
* add runable script for autoround by n1ck-guo in https://github.com/intel/auto-round/pull/225
* update readme by n1ck-guo in https://github.com/intel/auto-round/pull/242
* Add MANIFEST.in file to include requirements.txt by XuehaoSun in https://github.com/intel/auto-round/pull/243
* fix example bug by n1ck-guo in https://github.com/intel/auto-round/pull/245
* enable llava int4 inference with autoround format by WeiweiZhang1 in https://github.com/intel/auto-round/pull/237
* remove autoawq requirement at packing stage by n1ck-guo in https://github.com/intel/auto-round/pull/249
* remove unused log by n1ck-guo in https://github.com/intel/auto-round/pull/252
* support INC API by WeiweiZhang1 in https://github.com/intel/auto-round/pull/255
* avoid potential bug for auto-gptq 0.8 by wenhuach21 in https://github.com/intel/auto-round/pull/250
* fix example by n1ck-guo in https://github.com/intel/auto-round/pull/256
* fix preci by n1ck-guo in https://github.com/intel/auto-round/pull/258
* enable_qwen2-vl_quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/248
* update eval and fix example by n1ck-guo in https://github.com/intel/auto-round/pull/260
* refine autoawq exporting code by wenhuach21 in https://github.com/intel/auto-round/pull/261
* better support quant_lm_head for larger models by wenhuach21 in https://github.com/intel/auto-round/pull/263
* Fix 3bit packing for auto-gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/264
* Add a warning for improper export formats. by wenhuach21 in https://github.com/intel/auto-round/pull/265
* Update readme for VLM support and integration by wenhuach21 in https://github.com/intel/auto-round/pull/266
* remove g_idx in gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/267
* keep the dtype after qdq by wenhuach21 in https://github.com/intel/auto-round/pull/268
* enable llama3.2-vision model quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/269
* fix mutable default value by wenhuach21 in https://github.com/intel/auto-round/pull/272
* change to even rounding for mantissa of mx_fp by wenhuach21 in https://github.com/intel/auto-round/pull/277
* adamround bugfix, refine import by WeiweiZhang1 in https://github.com/intel/auto-round/pull/275
* [Important Change]set full range sym as the default by wenhuach21 in https://github.com/intel/auto-round/pull/278
* refine eval by wenhuach21 in https://github.com/intel/auto-round/pull/282
* qwen2_bugfix, add adamround vision UT by WeiweiZhang1 in https://github.com/intel/auto-round/pull/281

New Contributors
* dependabot made their first contribution in https://github.com/intel/auto-round/pull/223

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.3...v0.3.1

0.3

- **Highlights:**

- **Broader Device Support**:
- Expanded support for CPU, HPU, and CUDA inference in the AutoRound format, resolving the 2-bit accuracy issue.
- **New Recipes and Model Releases**:
- Published numerous recipes on the [Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), showcasing impressive results on LLaMa 3.1 and other leading models.
- **Experimental Features**:
- Introduced several experimental features, including activation quantization and `mx_fp`, with promising outcomes with AutoRound.
- **Multimodal Model Support**:
- Extended capabilities for tuning and inference across several multimodal models.

**Lowlights:**

- Implemented support for `low_cpu_mem_usage`, `auto_awq` format, calibration dataset concatenation, and calibration datasets with chat templates.

0.2

**Overview**

We supported the Intel XPU format and implemented lm-head quantization and inference, reducing the model size from 5.4GB to 4.7GB for LLAMA3 at W4G128. Additionally, we supported both local and mixed online datasets for calibration. By optimizing memory usage and tuning costs, the calibration process now takes approximately 20 minutes for 7B models and 2.5 hours for 70B models with 512 samples by setting `disable_low_gpu_mem_usage`.

**Others**:

More accuracy data as presented in [[paper](https://arxiv.org/pdf/2309.05516)](https://arxiv.org/pdf/2309.05516) and [[low_bit_open_llm_leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard)](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard)

More technical details as presented in [[paper](https://arxiv.org/pdf/2309.05516)](https://arxiv.org/pdf/2309.05516)

Known issues:

Large discrepancy between gptq model and qdq model for asymmetric quantization in some scenarios. We are working on it.

0.1

**Overview**

AutoRound introduces an innovative weight-only quantization algorithm designed specifically for low-bit LLM inference, approaching near-lossless compression for a range of popular models including gemma-7B, Mistral-7b, Mixtral-8x7B-v0.1, Mixtral-8x7B-Instruct-v0.1, Phi2, LLAMA2 and more at W4G128. AutoRound consistently outperforms established methods across the majority of scenarios at W4G128, W4G-1, W3G128, and W2G128 .

**Key Features**

- **Wide Model Support**: AutoRound caters to a diverse range of model families. About 20 model families have been verified.
- **Export Flexibility**: Effortlessly export quantized models to ITREX[1] and AutoGPTQ[2] formats for seamless deployment on Intel CPU and Nvidia GPU platforms respectively.
- **Device Compatibility**: Compatible with tuning devices including Intel CPUs, Intel Guadi2, and Nvidia GPUs.
- **Dataset Flexibility**: AutoRound supports calibration with Pile10k and MBPP datasets, with easy extensibility to incorporate additional datasets.

**Examples**

- Explore language modeling and code generation examples to unlock the full potential of AutoRound.

**Additional Benefits**

- PreQuantized Models: Access a variety of pre-quantized models on Hugging Face for immediate integration into your projects, with more models under review and coming soon.
- Comprehensive Accuracy Data: Simplified user deployment with extensive accuracy data provided.

Known issues:

- baichuan-inc/Baichuan2-13B-Chat has some issues, we will support it soon

Reference:

[1] https://github.com/intel/intel-extension-for-transformers

[2] https://github.com/AutoGPTQ/AutoGPTQ

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.