What's Changed
⚡ Offload tokenizer fixes to [Toke(n)icer](https://github.com/modelcloud/tokenicer) pkg.
⚡ Optimized `lm_head` quant time and vram usage.
⚡ Optimized `DeekSeek v3/R1` model quant vram usage.
⚡ 3x speed-up for Torch kernel when using Pytorch >= 2.5.0 with model.compile().
⚡ New `calibration_dataset_concat_size` option to enable calibration data concat mode to mimic original GPTQ data packing strategy which may improve quant speed and accuracy for datasets like wikitext2.
🐛 Fixed Optimum compat and `XPU`/`IPEX` auto kernel selection regresion in v1.8.1
* Fix init arg order and `optimum` compat by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1240
* [FIX][Optimize] lm_head quantize by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1239
* [Model] [DeepSpeek] un-merge `gate_proj` and `up_proj` by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1241
* Use Toke(n)icer by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1242
https://github.com/ModelCloud/GPTQModel/pull/1244
* Add Tokenicer Test by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1245
* prepare for 1.8.2 release by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1243
* simplify calls to tokenicer by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1246
* Update requirements.txt by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1248
* fix trust_remote was lost by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1249
* fix trust_remote was lost by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1250
* prepare for 1.8.5 release by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1251
* fix unit tests & tweak logic for selecting backends by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1253
* install tokenicer form git & do ruff by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1254
* fix k,v is not a dict by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1255
* fix not enough values to unpack (expected 2, got 1) by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1256
* fix sglang test requires numpy<2.0 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1258
* fix ipex backend by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/1259
* ipex should be packable, reverted pr 1259 importer.py changes by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1260
* remove sentencepiece by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1261
* speed up torch dequantize by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1262
* Add `calibration_dataset_concat_size` option/mode by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1257
* add transformers test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1264
* Add kernel torch.compile hook by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1265
* [FIX]fix vl model prepare_dataset by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1266
**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.8.1...v1.9.0