Gptqmodel

Latest version: v2.2.0

Safety actively analyzes 723717 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

2.2.0

What's Changed

✨ New Qwen 2.5 VL model support. Prelim Qwen 3 model support.
✨ New samples log column during quantization to track module activation in MoE models.
✨ Loss log column now color-coded to highlight modules that are friendly/resistant to quantization.
✨ Progress (per-step) stats during quantization now streamed to log file.
✨ Auto bfloat16 dtype loading for models based on model config.
✨ Fix kernel compile for Pytorch/ROCm.
✨ Slightly faster quantization and auto-resolve some low-level oom issues for smaller vram gpus.

* Enable ipex tests for CPU/XPU by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/1460
* test kernel accuracies with more shapes on cuda by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1461
* Fix rocm flags by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1467
* use table like logging format by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1471
* stream process log entries to persistent file by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1472
* fix some models need trust-remote-code arg by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1474
* Fix wq dtype by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1475
* add colors to quant loss column by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1477
* add prelim qwen3 support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1478
* Update eora.py for further optimization by nbasyl in https://github.com/ModelCloud/GPTQModel/pull/1488
* faster cholesky inverse and avoid oom when possible by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1494
* [MODEL] supports qwen2_5_vl by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1493


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v2.1.0...v2.2.0

2.1.0

What's Changed

✨ New QQQ quantization method and inference support!
✨ New Google `Gemma 3` day-zero model support.
✨ New Alibaba `Ovis 2` VL model support.
✨ New AMD `Instella` day-zero model support.
✨ New `GSM8K Platinum` and `MMLU-Pro` benchmarking suppport.
✨ Peft Lora training with GPTQModel is now 30%+ faster on all gpu and IPEX devices.
✨ Auto detect MoE modules not activated during quantization due to insufficient calibration data.
✨ `ROCm` setup.py compat fixes.
✨ Optimum and Peft compat fixes.
✨ Fixed Peft bfloat16 training.

* auto enable flash_attn only when flash-attn was installed by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1372
* Fix rocm compat by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1373
* fix unnecessary mkdir by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1374
* add test_kernel_output_xpu.py by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1382
* clean test_kernel_output_xpu.py by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1383
* tremove xpu support of triton kernel by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1384
* [MODEL] Add instella support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1385
* Fix optimum/peft trainer integration by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1381
* rename peft test file by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1387
* [CI] fix wandb was not installed & update test_olora_finetuning_xpu.py by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1388
* Add lm-eval `GSM8k Platinum` by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1394
* Remove cuda kernel by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1396
* fix exllama kernels not compiled by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1397
* update tests by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1398
* make the kernel output validation more robust by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1399
* speed up ci by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1400
* add fwd counter by yuchiwang in https://github.com/ModelCloud/GPTQModel/pull/1389
* allow triton and ipex to inherit torch kernel and use torch for train… by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1401
* fix skip moe modules when fwd count is 0 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1404
* fix ipex linear post init for finetune by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/1406
* fix optimum compat by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1408
* [Feature] Add mmlupro API by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1405
* add training callback by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1409
* Fix bf16 training by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1410
* fix bf16 forward for triton by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1411
* Add QQQ by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1402
* make IPEX or any kernel that uses Torch for Training to auto switch v… by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1412
* [CI] xpu inference test by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1380
* [FIX] qqq with eora by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1415
* [FIX] device error by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1417
* make quant linear expose internal buffers by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1418
* Fix bfloat16 kernels by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1420
* fix qqq bfloat16 forward by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1423
* Fix ci10 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1424
* fix marlin bf16 compat by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1427
* [CI] no need reinstall requirements by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1426
* [FIX] dynamic save error by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1428
* [FIX] super().post_init() calling order by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1431
* fix bitblas choose IPEX in cuda env by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1432
* Fix exllama is not packable by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1433
* disable exllama for training by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1435
* remove TritonV2QuantLinear for xpu test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1436
* [MODEL] add gemma3 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1434
* fix the error when downloading models using modelscope by mushenL in https://github.com/ModelCloud/GPTQModel/pull/1437
* Add QQQ Rotation by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1425
* fix no __init__.py by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1438
* Fix hardmard import by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1441
* Eora final by nbasyl in https://github.com/ModelCloud/GPTQModel/pull/1440
* triton is not validated for ipex by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1445
* Fix exllama adapter by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1446
* fix rocm compile by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1447
* [FIX] Correctly obtain the submodule's device by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1448
* fix rocm not compatible with exllama v2 and eora kernel by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1449
* revert overflow code by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1450
* add kernel dtype support and add full float15 vs bfloat16 kernel testing by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1452
* [MODEL] add Ovis2 support and bug fix by Fusionplay in https://github.com/ModelCloud/GPTQModel/pull/1454
* add unit test for ovis2 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1456

New Contributors
* yuchiwang made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1389
* mushenL made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1437
* nbasyl made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1440
* Fusionplay made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1454

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v2.0.0...v2.1.0

2.0.0

What's Changed

🎉 GPTQ quantization internals are now broken into multiple stages (processes) for feature expansion.
🎉 Synced Marlin kernel inference quality fix from upstream. Added MARLIN_FP16, lower-quality but faster backend.
🎉 ModelScope support added.
🎉 Logging and cli progress bar output has been revamped with sticky bottom progress.
🎉 Added CI tests to track regression in kernel inference quality and sweep all bits/group_sizes.
🎉 Delegate loggin/progressbar to [LogBar](https://github.com/modelcloud/logbar) pkg.
🐛 Fix ROCm version auto detection in setup install.
🐛 Fixed generation_config.json save and load.
🐛 Fixed Transformers v4.49.0 compat. Fixed compat of models without bos.
🐛 Fixed group_size=-1 and bits=3 packing regression.
🐛 Fixed Qwen 2.5 MoE regressions.

* fix 3 bit packing regression, fixed 1278 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1280
* Fix supported models list (syntax error) by Forenche in https://github.com/ModelCloud/GPTQModel/pull/1281
* feat: load model from modelscope by suluyana in https://github.com/ModelCloud/GPTQModel/pull/1283
* merge eval & utils.lm_eval by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1282
* fix modelscope import & tests by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1285
* allow passing model instance to evalplus & update tokenizer loading logics by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1284
* fix lm-eval & vllm check tokenizer type by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1287
* Fix `generation_config.json` not auto-saved by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1292
* [SAVE] Save config files with empty state dict by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1293
* [SAVE] Save processor related config files by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1295
* fix wrong order of config save causing sharded tensors to be removed by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1297
* [FIX] not pack when group_size=-1 by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1298
* cleanup marlin paths: marlin does conversion on `post_init` by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1310
* bump tokenicer to v0.0.3 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1308
* clean is_marlin_format for tests by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1311
* [CI] fix sglang test name & add status logs & remove exllama packing test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1312
* skip v1 to v2 conversion for sym=True only kernels by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1314
* bump tokenicer to 0.0.4 & remove FORMAT_FIELD_COMPAT_MARLIN by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1315
* revert is_marlin_format check by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1316
* Improve Marlin accuracy (default) but add `MARLIN_FP16` backend for faster with less-accuracy by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1317
* marlin fp32 mode should also be enabled if kernel was selected due to… by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1318
* refractor logger by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1319
* fix typo by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1320
* refractor logger and have progress bar sticky to bottom of cli by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1322
* [CI] fix tokenicer upgraded transformers & install bitblas for test_save_quanted_model by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1321
* [CI] allow to select compiler server & move model test to correct dir by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1323
* fix bitblas loading regression by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1324
* marlin fp16 warning missed check by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1325
* fix custom logger overriding system level logger by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1327
* fix progress bar for packing by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1326
* More log fixes by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1328
* fix no backend when creating a quant linear by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1329
* use relative path instead of importing gptqmodel by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1331
* no need patch vllm now by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1332
* [CI] fix CI url by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1333
* fix oom by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1335
* add default value for backend, fix optimum doesn't pass it by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1334
* refractor pb and pb usage by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1341
* fix generator has no length info by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1342
* replace utils.Progressbar with logbar by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1343
* [CI] update UI by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1344
* fix logbar api usage by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1345
* fix v2 to v1 missed logic bypass by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1347
* [CI] fix xpu env has no logbar by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1346
* [CI] update runner ip env & fix show-statistics didn't run by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1348
* fix time was not imported by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1349
* update device-smi depend to v0.4.0 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1351
* [CI] install requirements.txt for m4 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1352
* Exllama V1 is Packable by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1356
* [FIX] test_packable.py by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1357
* [setup] use torch.version.hip for rocm version check by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1360
* save/load peft lora by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1358
* update device-smi to 0.4.1 for rocm fix by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1362
* strip model path by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1363
* [CI] exllama v1 kernel now eligible for quant stage by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1364
* Fix transformers modeling code passing `input.shape[0] == 0` to nn.module by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1365
* simplify log var by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1368
* fix import by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1369
* update by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1370

New Contributors
* Forenche made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1281
* suluyana made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/1283

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.9.0...v2.0.0

1.9.0

What's Changed

⚡ Offload tokenizer fixes to [Toke(n)icer](https://github.com/modelcloud/tokenicer) pkg.
⚡ Optimized `lm_head` quant time and vram usage.
⚡ Optimized `DeekSeek v3/R1` model quant vram usage.
⚡ 3x speed-up for Torch kernel when using Pytorch >= 2.5.0 with model.compile().
⚡ New `calibration_dataset_concat_size` option to enable calibration data concat mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like wikitext2.
🐛 Fixed Optimum compat and `XPU`/`IPEX` auto kernel selection regresion in v1.8.1


* Fix init arg order and `optimum` compat by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1240
* [FIX][Optimize] lm_head quantize by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1239
* [Model] [DeepSpeek] un-merge `gate_proj` and `up_proj` by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1241
* Use Toke(n)icer by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1242
https://github.com/ModelCloud/GPTQModel/pull/1244
* Add Tokenicer Test by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1245
* prepare for 1.8.2 release by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1243
* simplify calls to tokenicer by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1246
* Update requirements.txt by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1248
* fix trust_remote was lost by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1249
* fix trust_remote was lost by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1250
* prepare for 1.8.5 release by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1251
* fix unit tests & tweak logic for selecting backends by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1253
* install tokenicer form git & do ruff by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1254
* fix k,v is not a dict by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1255
* fix not enough values to unpack (expected 2, got 1) by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1256
* fix sglang test requires numpy<2.0 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1258
* fix ipex backend by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/1259
* ipex should be packable, reverted pr 1259 importer.py changes by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1260
* remove sentencepiece by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1261
* speed up torch dequantize by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1262
* Add `calibration_dataset_concat_size` option/mode by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1257
* add transformers test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1264
* Add kernel torch.compile hook by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1265
* [FIX]fix vl model prepare_dataset by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1266


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.8.1...v1.9.0

1.8.1

What's Changed

⚡ `DeekSeek v3/R1` model support.
⚡ New flexible weight `packing`: allow quantized weights to be packed to `[int32, int16, int8]` dtypes. Triton and Torch kernels supports full range of new QuantizeConfig.pack_dtype.
⚡ Over 50% speedup for `vl` model quantization (Qwen 2.5-VL + Ovis)
⚡ New `auto_gc: bool` control in `quantize()` which can reduce quantization time for small model with no chance of oom.
⚡ New `GPTQModel.push_to_hub() `api for easy quant model upload to HF repo.
⚡ New `buffered_fwd: bool` control in model.quantize().
🐛 Fixed `bits=3` packing and `group_size=-1` regression in v1.7.4.
🐛 Fixed Google Colab install requiring two install passes
🐛 Fixed Python 3.10 compatibility

* Flexible Pack DType by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1158
* cuda needs to declare pack dtypes by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1169
* fix pass pack dtype by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1172
* Pass dtype by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1173
* move in/out features and grop_size init to base by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1174
* move self.maxq to base class by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1175
* consolidate pack() into packer cls by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1176
* Add `pack_dtype` to dynamic config and fix validate by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1178
* Refract 4 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1180
* Refractor and simplify multi-kernel selection/init by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1183
* Update/Refractor Bitblas/Marlin/Cuda by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1184
* push bitblas logic down by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1185
* Revert Bitblas to 0.0.1-dev13 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1186
* Do not export config.key if value is None by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1187
* Fix examples/perplexity by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1191
* [MODEL] add deepseek v3 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1127
* Push register buffer down to base class and rename all in/out features by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1193
* Fix 1196 hf_transfer not accepting `max_memory` arg by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1197
* reduce peak memory and reduce quant time by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1198
* skip zero math by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1199
* fix test_packing_speed by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1202
* Update test_quant_time.py by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1203
* experimental `buffered_fwd` quantize control by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1205
* Fix dynamic regression on quant save by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1208
* Python 3.10 type-hint compt bug by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1213
* Fix colab install by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1215
* add `GPTQModel.push_to_hub()` support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1216
* default to 8GB shard-size for model save by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1217
* Auto gc toggle by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1219
* fix 3bit packing and inference by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1218
* fix merge error by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1234
* fix var name by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1235
* fix visual llm slow forward by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1232

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.7.4...v1.8.1

1.8.0

What's Changed

⚡ `DeekSeek v3/R1` model support.
⚡ New flexible weight `packing`: allow quantized weights to be packed to `[int32, int16, int8]` dtypes. Triton and Torch kernels supports full range of new QuantizeConfig.pack_dtype.
⚡ New `auto_gc: bool` control in `quantize()` which can reduce quantization time for small model with no chance of oom.
⚡ New `GPTQModel.push_to_hub() `api for easy quant model to HF repo.
⚡ New `buffered_fwd: bool` control in model.quantize().
🐛 Fixed `bits=3` packing regression in v1.7.4.
🐛 Fixed Google Colab install requiring two install passes
🐛 Fixed Python 3.10 compatibility

* start 1.8.0-dev cycle by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1168
* Flexible Pack DType by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1158
* cuda needs to declare pack dtypes by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1169
* fix pass pack dtype by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1172
* Pass dtype by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1173
* move in/out features and grop_size init to base by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1174
* move self.maxq to base class by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1175
* consolidate pack() into packer cls by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1176
* Add `pack_dtype` to dynamic config and fix validate by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1178
* format by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1179
* Refract 4 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1180
* Refractor and simplify multi-kernel selection/init by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1183
* Update/Refractor Bitblas/Marlin/Cuda by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1184
* push bitblas logic down by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1185
* Revert Bitblas to 0.0.1-dev13 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1186
* Do not export config.key if value is None by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1187
* Fix examples/perplexity by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1191
* [MODEL] add deepseek v3 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1127
* Push register buffer down to base class and rename all in/out features by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1193
* Fix 1196 hf_transfer not accepting `max_memory` arg by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1197
* reduce peak memory and reduce quant time by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1198
* skip zero math by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1199
* fix test_packing_speed by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1202
* Update test_quant_time.py by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1203
* experimental `buffered_fwd` quantize control by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1205
* Fix dynamic regression on quant save by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1208
* Python 3.10 type-hint compt bug by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1213
* Fix colab install by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1215
* add `GPTQModel.push_to_hub()` support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1216
* default to 8GB shard-size for model save by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1217
* Auto gc toggle by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1219
* fix 3bit packing and inference by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1218


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.7.4...v1.8.0

Page 1 of 8

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.