Release Highlights:
New Features:
**Full-Range Symmetric Quantization**: We’ve introduced full-range symmetric quantization, which often matches or even exceeds the performance of asymmetric quantization, especially at lower bit widths, such as 2.
**Command-Line Support**: You can now quantize models using the command `auto-round --model xxx --format xxx`
Default Exporting Format Change: The default format has been updated to auto_round instead of auto_gptq.
Muiti-thread packing: up to 2X speed up on packing phase
Bug Fixes:
Resolved Missing Cached Position Embeddings: Fixed an issue with missing cached position embeddings in Transformer version 4.45.2.
Mutable Default Values Issue: Addressed problems related to mutable default values.
3 bit packing bug for AutoGPTQ format
What's Changed
* Add setseed in autoround by WeiweiZhang1 in https://github.com/intel/auto-round/pull/201
* support autoawq format by yintong-lu in https://github.com/intel/auto-round/pull/115
* Remove UT coverage check by XuehaoSun in https://github.com/intel/auto-round/pull/202
* set autoround format as default to unify CPU/HPU/CUDA by wenhuach21 in https://github.com/intel/auto-round/pull/205
* add local file of pile-10k by WeiweiZhang1 in https://github.com/intel/auto-round/pull/198
* modify setup.py by n1ck-guo in https://github.com/intel/auto-round/pull/206
* limit the scale minimum value not to 0 by WeiweiZhang1 in https://github.com/intel/auto-round/pull/211
* fix example dataset regression by WeiweiZhang1 in https://github.com/intel/auto-round/pull/212
* remove local pile file by WeiweiZhang1 in https://github.com/intel/auto-round/pull/213
* update xpu format exporting by WeiweiZhang1 in https://github.com/intel/auto-round/pull/214
* fix a bug in autoround format inference by wenhuach21 in https://github.com/intel/auto-round/pull/215
* avoid underflow and overflow for exllamav2 by wenhuach21 in https://github.com/intel/auto-round/pull/218
* add qwen int4 model, refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/217
* [Experimental Feature]fast tuning norm/bias at 2 bits by wenhuach21 in https://github.com/intel/auto-round/pull/208
* update readme by wenhuach21 in https://github.com/intel/auto-round/pull/220
* refine eval_042 to enable parallelize evaluation by WeiweiZhang1 in https://github.com/intel/auto-round/pull/221
* Enable phi3v tuning by WeiweiZhang1 in https://github.com/intel/auto-round/pull/197
* Bump setuptools from 69.5.1 to 70.0.0 in /examples/multimodal-modeling/Phi-3-vision by dependabot in https://github.com/intel/auto-round/pull/223
* refine example by WeiweiZhang1 in https://github.com/intel/auto-round/pull/224
* change the scale thresh generally by WeiweiZhang1 in https://github.com/intel/auto-round/pull/229
* add quantized models by 3rd party by WeiweiZhang1 in https://github.com/intel/auto-round/pull/230
* add meta3.1-70B-instruct model, refine docs by WeiweiZhang1 in https://github.com/intel/auto-round/pull/231
* fix model link by WeiweiZhang1 in https://github.com/intel/auto-round/pull/232
* refine docs, add accuracy data, add receip and eval scripts by WeiweiZhang1 in https://github.com/intel/auto-round/pull/226
* add brief formats introduction by wenhuach21 in https://github.com/intel/auto-round/pull/236
* update readme and add itrex in the requirements.txt by wenhuach21 in https://github.com/intel/auto-round/pull/238
* add tritonv2, improve packing and pbar by wenhuach21 in https://github.com/intel/auto-round/pull/239
* refine the code and the speedup is notable by wenhuach21 in https://github.com/intel/auto-round/pull/240
* move some settings from example to main by wenhuach21 in https://github.com/intel/auto-round/pull/241
* add runable script for autoround by n1ck-guo in https://github.com/intel/auto-round/pull/225
* update readme by n1ck-guo in https://github.com/intel/auto-round/pull/242
* Add MANIFEST.in file to include requirements.txt by XuehaoSun in https://github.com/intel/auto-round/pull/243
* fix example bug by n1ck-guo in https://github.com/intel/auto-round/pull/245
* enable llava int4 inference with autoround format by WeiweiZhang1 in https://github.com/intel/auto-round/pull/237
* remove autoawq requirement at packing stage by n1ck-guo in https://github.com/intel/auto-round/pull/249
* remove unused log by n1ck-guo in https://github.com/intel/auto-round/pull/252
* support INC API by WeiweiZhang1 in https://github.com/intel/auto-round/pull/255
* avoid potential bug for auto-gptq 0.8 by wenhuach21 in https://github.com/intel/auto-round/pull/250
* fix example by n1ck-guo in https://github.com/intel/auto-round/pull/256
* fix preci by n1ck-guo in https://github.com/intel/auto-round/pull/258
* enable_qwen2-vl_quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/248
* update eval and fix example by n1ck-guo in https://github.com/intel/auto-round/pull/260
* refine autoawq exporting code by wenhuach21 in https://github.com/intel/auto-round/pull/261
* better support quant_lm_head for larger models by wenhuach21 in https://github.com/intel/auto-round/pull/263
* Fix 3bit packing for auto-gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/264
* Add a warning for improper export formats. by wenhuach21 in https://github.com/intel/auto-round/pull/265
* Update readme for VLM support and integration by wenhuach21 in https://github.com/intel/auto-round/pull/266
* remove g_idx in gptq format by wenhuach21 in https://github.com/intel/auto-round/pull/267
* keep the dtype after qdq by wenhuach21 in https://github.com/intel/auto-round/pull/268
* enable llama3.2-vision model quantization by WeiweiZhang1 in https://github.com/intel/auto-round/pull/269
* fix mutable default value by wenhuach21 in https://github.com/intel/auto-round/pull/272
* change to even rounding for mantissa of mx_fp by wenhuach21 in https://github.com/intel/auto-round/pull/277
* adamround bugfix, refine import by WeiweiZhang1 in https://github.com/intel/auto-round/pull/275
* [Important Change]set full range sym as the default by wenhuach21 in https://github.com/intel/auto-round/pull/278
* refine eval by wenhuach21 in https://github.com/intel/auto-round/pull/282
* qwen2_bugfix, add adamround vision UT by WeiweiZhang1 in https://github.com/intel/auto-round/pull/281
New Contributors
* dependabot made their first contribution in https://github.com/intel/auto-round/pull/223
**Full Changelog**: https://github.com/intel/auto-round/compare/v0.3...v0.3.1