What's Changed
⚡ `EvalPlus` harness integration merged upstream. We now support both `lm-eval` and `EvalPlus`.
⚡ Added pure torch `Torch` kernel.
⚡ Refactored `Cuda` kernel to be `DynamicCuda` kernel.
⚡ `Triton` kernel now auto-padded for max model support.
⚡ `Dynamic` quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-`Marlin` kernel selection.
🗑 Deprecated the saving of `Marlin` weight format. `Marlin` allows auto conversion of `gptq` format to `Marlin` during runtime. `gptq` format allows max kernel flexibility including `Marlin` kernel support.
Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.
* Remove Marlin old kernel and Marlin format saving. Marlin[new] is still supported via inference. by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/714
* Remove marlin(old) kernel codes & do ruff by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/719
* [FIX] gptq v2 load by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/724
* Add hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format,… by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/727
* if use the ipex quant linear, no need to convert by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/730
* hf_select_quant_linear add device_map by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/732
* Add TorchQuantLinear by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/735
* Add QUANT_TYPE in qlinear by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/736
* Replace error with warning for Intel CPU check by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/737
* Add BACKEND.AUTO_CPU by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/739
* Fix ipex linear check by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/741
* fFx select quant linear by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/742
* Now meta.quantizer value can be an array by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/744
* Receive checkpoint_format argument by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/747
* Modify hf convert gptq v2 to v1 format by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/749
* update score max negative delta by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/748
* [CI] max parallel jobs 10 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/751
* hymba got high score by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/752
* hf_select_quant_linear() always set pack=True by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/754
* Refractor CudaQuantLinear to DynamicCudaQuantLinear by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/759
* Remove filename prefix on qlinear dir by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/760
* Replace Nvidia-smi with devicesmi by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/761
* Fix XPU training by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/763
* Fix auto marlin kernel selection by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/765
* Add BaseQuantLinear SUPPORTS_TRAINING declaration by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/766
* Add Eval() api to support LM-Eval or EvalPlus benchmark harnesses by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/750
* Fix validate_device by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/769
* Force BaseQuantLinear properties to be explicitly declared by all QuantLinears by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/767
* Convert str backend to enum backend by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/772
* Remove nested list in dict by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/774
* Fix training qlinear by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/777
* Check kernel by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/764
* BACKEND.AUTO if backend is None by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/781
* Fix lm_head quantize test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/784
* Fix exllama doesn't support 8 bit by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/790
* Use set() to avoid calling torch twice by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/791
* Fix ipex cpu backend import error and fix too much logs by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/793
* Eval API opt by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/794
* Fixed ipex linear param check and logging once by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/795
* Check device before sync by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/796
* Only AUTO will try other quant linears by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/797
* Add SUPPORTS_AUTO_PADDING property to QuantLinear by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/799
* Dynamic now support skipping modules/layers by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/804
* Fix module was skipped but still be looped by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/806
* Make Triton kernel auto-pad on features/group_size by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/808
**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.3.1...v1.4.0