Auto-round

Latest version: v0.11

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

0.4.3

Highlights:
fix incorrect device setting in autoround format inference by WeiweiZhang1 in https://github.com/intel/auto-round/pull/383
remove the dependency on AutoGPTQ for CPU by XuehaoSun in https://github.com/intel/auto-round/pull/380

What's Changed
* support_llava_hf_vlm_example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/381
* fix block_name_to_quantize by WeiweiZhang1 in https://github.com/intel/auto-round/pull/382
* fix incorrect device setting in autoround format inference by WeiweiZhang1 in https://github.com/intel/auto-round/pull/383
* refine homepage, update model links by WeiweiZhang1 in https://github.com/intel/auto-round/pull/385
* update eval basic usage by n1ck-guo in https://github.com/intel/auto-round/pull/384
* refine error msg and dump more log in the tuning by wenhuach21 in https://github.com/intel/auto-round/pull/386
* remove the dependency on AutoGPTQ for CPU and bump to V0.4.3 by XuehaoSun in https://github.com/intel/auto-round/pull/380

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.4.2...v0.4.3

0.4.2

Highlights
1 Fix autoawq exporting issue
2 remove bias exporting if possible in autogptq format

What's Changed
* bump version into v0.4.1 by XuehaoSun in https://github.com/intel/auto-round/pull/350
* Update docker user and remove baseline UT by XuehaoSun in https://github.com/intel/auto-round/pull/347
* delete llm example and refine readme by wenhuach21 in https://github.com/intel/auto-round/pull/354
* Simulated W4Afp8 Quantization by wenhuach21 in https://github.com/intel/auto-round/pull/331
* add QWQ-32B, VLM, Qwen2.5, Llama3.1 int4 models by wenhuach21 in https://github.com/intel/auto-round/pull/356
* fix awq exporting by wenhuach21 in https://github.com/intel/auto-round/pull/358
* Tensor reshape bugfix by WeiweiZhang1 in https://github.com/intel/auto-round/pull/364
* fix awq backend and fp_layers issue by wenhuach21 in https://github.com/intel/auto-round/pull/363
* fix awq exporting bugs by wenhuach21 in https://github.com/intel/auto-round/pull/365
* fix bug of only_text_test check due to inference issue on cpu by n1ck-guo in https://github.com/intel/auto-round/pull/362
* add gpu test by wenhuach21 in https://github.com/intel/auto-round/pull/367
* using multicard when device set to "auto" by n1ck-guo in https://github.com/intel/auto-round/pull/368
* quant_block_names enhancement by WeiweiZhang1 in https://github.com/intel/auto-round/pull/369
* [HPU] Add lazy mode back by yiliu30 in https://github.com/intel/auto-round/pull/371
* remove bias exporting if possible in autogptq format by wenhuach21 in https://github.com/intel/auto-round/pull/375
* save processor automatically by n1ck-guo in https://github.com/intel/auto-round/pull/372
* Add gpu ut by wenhuach21 in https://github.com/intel/auto-round/pull/370
* fix gpu ut by n1ck-guo in https://github.com/intel/auto-round/pull/376
* fix typos by wenhuach21 in https://github.com/intel/auto-round/pull/377

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.4.1...v0.4.2

0.4.1

Highlights:
- Fixed vllm calibration infinite loop issue
- Corrected the default value for the sym argument in the API configuration.

What's Changed
* fix typo by wenhuach21 in https://github.com/intel/auto-round/pull/342
* vllm/llama-vision llava calibration infinite loop fix by WeiweiZhang1 in https://github.com/intel/auto-round/pull/343
* [HPU]Enhance `numba` check by yiliu30 in https://github.com/intel/auto-round/pull/345
* [VLM]fix bs and grad reset by n1ck-guo in https://github.com/intel/auto-round/pull/344
* [HPU]Enhance installation check by yiliu30 in https://github.com/intel/auto-round/pull/346
* [Critical Bug]API use sym as default by wenhuach21 in https://github.com/intel/auto-round/pull/349
* triton backend requires< 3.0 by wenhuach21 in https://github.com/intel/auto-round/pull/348

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.4...v0.4.1

0.4

Highlights
[Experimental Feature] We provide API support for VLM models
[Kernel] We add ipex support for intel cpu
[Bug fix] We fix tuning bug for glm4 model
[Enhancement] better align `gradient_accumulate_steps` behavior for varied length input

What's Changed
* refine AuoRound format and support marlin repacking by wenhuach21 in https://github.com/intel/auto-round/pull/280
* update readme for v0.3.1 release by wenhuach21 in https://github.com/intel/auto-round/pull/283
* update readme for cpu inference by wenhuach21 in https://github.com/intel/auto-round/pull/284
* avoid deterministic algorithm warning in inference by wenhuach21 in https://github.com/intel/auto-round/pull/285
* fix mx_fp issues by wenhuach21 in https://github.com/intel/auto-round/pull/286
* update torch ao integration information by wenhuach21 in https://github.com/intel/auto-round/pull/287
* Refine code by wenhuach21 in https://github.com/intel/auto-round/pull/291
* Add ipex support for intel cpu by wenhuach21 in https://github.com/intel/auto-round/pull/292
* fix ipex tqdm mismatch issue by wenhuach21 in https://github.com/intel/auto-round/pull/293
* fix bug of backend by wenhuach21 in https://github.com/intel/auto-round/pull/294
* [Experimental Feature]support for common hf multimodel by n1ck-guo in https://github.com/intel/auto-round/pull/276
* use torch.compile by default for PyTorch versions 2.6 and above by wenhuach21 in https://github.com/intel/auto-round/pull/295
* refine forward hook by WeiweiZhang1 in https://github.com/intel/auto-round/pull/290
* eval for MLLMs by n1ck-guo in https://github.com/intel/auto-round/pull/296
* mllm eval bug fix by n1ck-guo in https://github.com/intel/auto-round/pull/297
* Port Numba-based packing from INC by yiliu30 in https://github.com/intel/auto-round/pull/301
* refine model config file for mixed precision quantization by wenhuach21 in https://github.com/intel/auto-round/pull/300
* fix glm4-9b batch dim issue by wenhuach21 in https://github.com/intel/auto-round/pull/304
* better align gradient_accumulate_steps for varied length input by wenhuach21 in https://github.com/intel/auto-round/pull/309
* Enable torch.compile on HPU by yiliu30 in https://github.com/intel/auto-round/pull/307
* Update autogptq exporting by wenhuach21 in https://github.com/intel/auto-round/pull/310
* fix typo by wenhuach21 in https://github.com/intel/auto-round/pull/311
* qwen2 vision quantization bugfix by WeiweiZhang1 in https://github.com/intel/auto-round/pull/313
* multiple gpu evaluation/calibration refine by wenhuach21 in https://github.com/intel/auto-round/pull/312
* HPU only release binary by yiliu30 in https://github.com/intel/auto-round/pull/302
* patch 1 for mllm by n1ck-guo in https://github.com/intel/auto-round/pull/298
* add torch compile arg by wenhuach21 in https://github.com/intel/auto-round/pull/314
* fix merge error by n1ck-guo in https://github.com/intel/auto-round/pull/316
* Update the check for HPU by yiliu30 in https://github.com/intel/auto-round/pull/318
* fix eval device issue by wenhuach21 in https://github.com/intel/auto-round/pull/319
* fix multiple device bug by wenhuach21 in https://github.com/intel/auto-round/pull/321
* add warning for no gptq exllamav2 kernel by wenhuach21 in https://github.com/intel/auto-round/pull/324
* add pile calib, rename quant_block_list to to_quant_block_names by WeiweiZhang1 in https://github.com/intel/auto-round/pull/322
* fix autogptq version error by wenhuach21 in https://github.com/intel/auto-round/pull/325
* new mllm eval by n1ck-guo in https://github.com/intel/auto-round/pull/317
* Add cpu only version by XuehaoSun in https://github.com/intel/auto-round/pull/315
* set default mllm dataset by n1ck-guo in https://github.com/intel/auto-round/pull/327
* fix fp_layers issue and force to FP16 on cuda for autoround format inference by wenhuach21 in https://github.com/intel/auto-round/pull/326
* fix the bug of test model support for test-only by n1ck-guo in https://github.com/intel/auto-round/pull/328
* Increase unit test timeout to 120 minutes by XuehaoSun in https://github.com/intel/auto-round/pull/330
* fix mllm dataset config bug and add gptq cuda backend by wenhuach21 in https://github.com/intel/auto-round/pull/329
* add tips and tricks for llm&mllm quantization by wenhuach21 in https://github.com/intel/auto-round/pull/333
* fix eval_bs in fake format and reset auto-gptq exporting max_shard_size by wenhuach21 in https://github.com/intel/auto-round/pull/332
* fix model_dtype issue and reformat mllm code by wenhuach21 in https://github.com/intel/auto-round/pull/335
* Exclude markdown files from unit test pipelines by XuehaoSun in https://github.com/intel/auto-round/pull/337
* refine mllm docs by WeiweiZhang1 in https://github.com/intel/auto-round/pull/336
* cogvlm doc by n1ck-guo in https://github.com/intel/auto-round/pull/339
* add qwen2.5 recipe and refine readme by WeiweiZhang1 in https://github.com/intel/auto-round/pull/338
* add cogvlm recipe and refine readme by WeiweiZhang1 in https://github.com/intel/auto-round/pull/340
* refine mllm API and add help info by n1ck-guo in https://github.com/intel/auto-round/pull/334

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.3.1...v0.4

0.3.1

Release Highlights:

New Features:

**Full-Range Symmetric Quantization**: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.

**Command-Line Support**: You can now quantize models using the command `auto-round --model xxx --format xxx`

Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.

Muiti-thread packing: up to 2X speed up on packing phase

Bug Fixes:

Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.

Mutable Default Values Issue: Addressed problems related to mutable default values.

3 bit packing bug for AutoGPTQ format

What's Changed
* Add setseed in autoround by WeiweiZhang1 in https://github.com/intel/auto-round/pull/201
* support autoawq format by yintong-lu in https://github.com/intel/auto-round/pull/115
* Remove UT coverage check by XuehaoSun in https://github.com/intel/auto-round/pull/202
* set autoround format as default to unify CPU/HPU/CUDA by wenhuach21 in https://github.com/intel/auto-round/pull/205
* add local file of pile-10k by WeiweiZhang1 in https://github.com/intel/auto-round/pull/198
* modify setup.py by n1ck-guo in https://github.com/intel/auto-round/pull/206
* limit the scale minimum value not to 0 by WeiweiZhang1 in https://github.com/intel/auto-round/pull/211
* fix example dataset regression by WeiweiZhang1 in https://github.com/intel/auto-round/pull/212
* remove local pile file by WeiweiZhang1 in https://github.com/intel/auto-round/pull/213
* update xpu format exporting by WeiweiZhang1 in https://github.com/intel/auto-round/pull/214
* fix a bug in autoround format inference by wenhuach21 in https://github.com/intel/auto-round/pull/215
* avoid underflow and overflow for exllamav2 by wenhuach21 in https://github.com/intel/auto-round/pull/218
* add qwen int4 model, refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/217
* [Experimental Feature]fast tuning norm/bias at 2 bits by wenhuach21 in https://github.com/intel/auto-round/pull/208
* update readme by wenhuach21 in https://github.com/intel/auto-round/pull/220
* refine eval_042 to enable parallelize evaluation by WeiweiZhang1 in https://github.com/intel/auto-round/pull/221
* Enable phi3v tuning by WeiweiZhang1 in https://github.com/intel/auto-round/pull/197
* Bump setuptools from 69.5.1 to 70.0.0 in /examples/multimodal-modeling/Phi-3-vision by dependabot in https://github.com/intel/auto-round/pull/223
* refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/224
* change the scale thresh generally by WeiweiZhang1 in https://github.com/intel/auto-round/pull/229
* add quantized models by 3rd party by WeiweiZhang1 in https://github.com/intel/auto-round/pull/230
* add meta3.1-70B-instruct model, refine docs by WeiweiZhang1 in https://github.com/intel/auto-round/pull/231
* fix model link by WeiweiZhang1 in https://github.com/intel/auto-round/pull/232
* refine docs, add accuracy data, add receip and eval scripts by WeiweiZhang1 in https://github.com/intel/auto-round/pull/226
* add brief formats introduction by wenhuach21 in https://github.com/intel/auto-round/pull/236
* update readme and add itrex in the requirements.txt by wenhuach21 in https://github.com/intel/auto-round/pull/238
* add tritonv2, improve packing and pbar by wenhuach21 in https://github.com/intel/auto-round/pull/239
* refine the code and the speedup is notable by wenhuach21 in https://github.com/intel/auto-round/pull/240
* move some settings from example to main by wenhuach21 in https://github.com/intel/auto-round/pull/241
* add runable script for autoround by n1ck-guo in https://github.com/intel/auto-round/pull/225
* update readme by n1ck-guo in https://github.com/intel/auto-round/pull/242
* Add MANIFEST.in file to include requirements.txt by XuehaoSun in https://github.com/intel/auto-round/pull/243
* fix example bug by n1ck-guo in https://github.com/intel/auto-round/pull/245
* enable llava int4 inference with autoround format by WeiweiZhang1 in https://github.com/intel/auto-round/pull/237
* remove autoawq requirement at packing stage by n1ck-guo in https://github.com/intel/auto-round/pull/249
* remove unused log by n1ck-guo in https://github.com/intel/auto-round/pull/252
* support INC API by WeiweiZhang1 in https://github.com/intel/auto-round/pull/255
* avoid potential bug for auto-gptq 0.8 by wenhuach21 in https://github.com/intel/auto-round/pull/250
* fix example by n1ck-guo in https://github.com/intel/auto-round/pull/256
* fix preci by n1ck-guo in https://github.com/intel/auto-round/pull/258
* enable_qwen2-vl_quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/248
* update eval and fix example by n1ck-guo in https://github.com/intel/auto-round/pull/260
* refine autoawq exporting code by wenhuach21 in https://github.com/intel/auto-round/pull/261
* better support quant_lm_head for larger models by wenhuach21 in https://github.com/intel/auto-round/pull/263
* Fix 3bit packing for auto-gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/264
* Add a warning for improper export formats. by wenhuach21 in https://github.com/intel/auto-round/pull/265
* Update readme for VLM support and integration by wenhuach21 in https://github.com/intel/auto-round/pull/266
* remove g_idx in gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/267
* keep the dtype after qdq by wenhuach21 in https://github.com/intel/auto-round/pull/268
* enable llama3.2-vision model quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/269
* fix mutable default value by wenhuach21 in https://github.com/intel/auto-round/pull/272
* change to even rounding for mantissa of mx_fp by wenhuach21 in https://github.com/intel/auto-round/pull/277
* adamround bugfix, refine import by WeiweiZhang1 in https://github.com/intel/auto-round/pull/275
* [Important Change]set full range sym as the default by wenhuach21 in https://github.com/intel/auto-round/pull/278
* refine eval by wenhuach21 in https://github.com/intel/auto-round/pull/282
* qwen2_bugfix, add adamround vision UT by WeiweiZhang1 in https://github.com/intel/auto-round/pull/281

New Contributors
* dependabot made their first contribution in https://github.com/intel/auto-round/pull/223

**Full Changelog**: https://github.com/intel/auto-round/compare/v0.3...v0.3.1

0.3

- **Highlights:**

- **Broader Device Support**:
- Expanded support for CPU, HPU, and CUDA inference in the AutoRound format, resolving the 2-bit accuracy issue.
- **New Recipes and Model Releases**:
- Published numerous recipes on the [Low Bit Open LLM Leaderboard](https://huggingface.co/spaces/Intel/low_bit_open_llm_leaderboard), showcasing impressive results on LLaMa 3.1 and other leading models.
- **Experimental Features**:
- Introduced several experimental features, including activation quantization and `mx_fp`, with promising outcomes with AutoRound.
- **Multimodal Model Support**:
- Extended capabilities for tuning and inference across several multimodal models.

**Lowlights:**

- Implemented support for `low_cpu_mem_usage`, `auto_awq` format, calibration dataset concatenation, and calibration datasets with chat templates.

Page 1 of 2

Releases

Has known vulnerabilities

Auto-round

Page 1 of 2

0.4.3

0.4.2

0.4.1

0.4

0.3.1

0.3

Page 1 of 2

Links

Releases