Gptqmodel

Latest version: v2.2.0

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 8

1.4.0

What's Changed

⚡ `EvalPlus` harness integration merged upstream. We now support both `lm-eval` and `EvalPlus`.
⚡ Added pure torch `Torch` kernel.
⚡ Refactored `Cuda` kernel to be `DynamicCuda` kernel.
⚡ `Triton` kernel now auto-padded for max model support.
⚡ `Dynamic` quantization now supports both positive +::default, and -: negative matching which allows matched modules to be skipped entirely for quantization.
⚡Added auto-kernel fallback for unsupported kernel/module pairs.
🐛 Fixed auto-`Marlin` kernel selection.
🗑 Deprecated the saving of `Marlin` weight format. `Marlin` allows auto conversion of `gptq` format to `Marlin` during runtime. `gptq` format allows max kernel flexibility including `Marlin` kernel support.

Lots of internal refractor and cleanup in-preparation for transformers/optimum/peft upstream PR merge.

* Remove Marlin old kernel and Marlin format saving. Marlin[new] is still supported via inference. by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/714
* Remove marlin(old) kernel codes & do ruff by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/719
* [FIX] gptq v2 load by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/724
* Add hf_convert_gptq_v1_to_v2_format, hf_convert_gptq_v2_to_v1_format,… by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/727
* if use the ipex quant linear, no need to convert by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/730
* hf_select_quant_linear add device_map by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/732
* Add TorchQuantLinear by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/735
* Add QUANT_TYPE in qlinear by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/736
* Replace error with warning for Intel CPU check by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/737
* Add BACKEND.AUTO_CPU by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/739
* Fix ipex linear check by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/741
* fFx select quant linear by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/742
* Now meta.quantizer value can be an array by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/744
* Receive checkpoint_format argument by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/747
* Modify hf convert gptq v2 to v1 format by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/749
* update score max negative delta by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/748
* [CI] max parallel jobs 10 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/751
* hymba got high score by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/752
* hf_select_quant_linear() always set pack=True by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/754
* Refractor CudaQuantLinear to DynamicCudaQuantLinear by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/759
* Remove filename prefix on qlinear dir by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/760
* Replace Nvidia-smi with devicesmi by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/761
* Fix XPU training by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/763
* Fix auto marlin kernel selection by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/765
* Add BaseQuantLinear SUPPORTS_TRAINING declaration by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/766
* Add Eval() api to support LM-Eval or EvalPlus benchmark harnesses by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/750
* Fix validate_device by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/769
* Force BaseQuantLinear properties to be explicitly declared by all QuantLinears by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/767
* Convert str backend to enum backend by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/772
* Remove nested list in dict by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/774
* Fix training qlinear by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/777
* Check kernel by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/764
* BACKEND.AUTO if backend is None by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/781
* Fix lm_head quantize test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/784
* Fix exllama doesn't support 8 bit by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/790
* Use set() to avoid calling torch twice by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/791
* Fix ipex cpu backend import error and fix too much logs by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/793
* Eval API opt by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/794
* Fixed ipex linear param check and logging once by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/795
* Check device before sync by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/796
* Only AUTO will try other quant linears by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/797
* Add SUPPORTS_AUTO_PADDING property to QuantLinear by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/799
* Dynamic now support skipping modules/layers by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/804
* Fix module was skipped but still be looped by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/806
* Make Triton kernel auto-pad on features/group_size by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/808

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.3.1...v1.4.0

1.3.1

What's Changed

⚡ Olmo2 model support.
⚡ Intel XPU acceleration via IPEX.
Sharding compat fix due to api deprecation in HF Transformers.
Removed triton dependency. Triton kernel now optionally dependent on triton pkg.
Fixed Hymba Test (Hymba requires desc_act=False)

* [FIX] use split_torch_state_dict_into_shards to replace shard_checkpoint by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/682
* [Model] add olmo2 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/678
* [FIX] Hymba currently only supports a batch size of 1 by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/683
* [CI] fix extensions is not defined by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/684
* Ipex XPU support by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/608
* [FIX] add require_pkgs_version and checks by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/693
* fix ipex test by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/691
* [FIX] remove require_transformers_version and require_tokenizers_version by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/695
* Remove use_safetensors argument by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/696
* Revert exllamav1 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/692
* Make Triton optional by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/697
* Unify backend use by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/700
* [FIX] fix test_hymba by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/704
* FIX IPEX XPU selection by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/705
* fix cpu/xpu backend selection by jiqing-feng in https://github.com/ModelCloud/GPTQModel/pull/706
* Upgrade device-smi depend by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/708
* [FIX] hymba quant needs desc_act=False by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/710

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.3.0...v1.3.1

1.3.0

What's Changed

Zero-Day Hymba model support added. Removed `tqdm` and `rogue` depends.

* Move lm-eval to utils to make it optional, fixed 664 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/666
* Add ipex bench code by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/660
* [MODEL] add hymba support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/651
* [FIX] HymbaConfig.conv_dim keys is converted from str to int by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/674
* [FIX] progress first index starts from 1 instead of 0 by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/673

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.2.3...v1.3.0

1.2.3

Stable release with all feature and model unit tests passing. Fixed lots of model unit tests that did not pass or passed incorrectly in previous releases.

HF GLM support added. GLM/ChatGLM has two different code forks: one if non-hf integrated, and latest one is integrated into transformers. HF GLM and non-HF GLM are not weight compatible and we support both variants.

What's Changed
* Add GLM (HF-ied) support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/581
* unit tests add args USE_VLLM by ZYC-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/582
* Quantize record info by ZYC-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/583
* [MISC] add gptqmodel[eval] and remove sentencepiece by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/602
* [MISC] requirements remove gekko, ninja, huggingface-hub, protobuf by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/603
* release gpu vram after layer.fwd by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/616
* Delete unsupported model & skip gptnoex by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/617
* [FIX] Some models put hidden_states in kwargs instead of args. by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/621
* lm_eval vllm task add max_model_len=4096 args by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/625
* try catch should only work with lmeval by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/628
* set USE_VLLM = False by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/629
* [FIX] if load quantized model. we will not monkey_path forward by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/638
* simplified ModelLoader ModelWriter func by ZYC-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/637
* disable chat for test_mpt by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/641
* Update unit_tests.yml by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/642
* fix tokenized[0] wrong when getting value from BatchEncoding type by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/643

New Contributors
* jiqing-feng made their first contribution in https://github.com/ModelCloud/GPTQModel/pull/527

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.2.1...v1.2.3

1.2.1

1.2.0

We are re-releasing 1.2.0 correctly as 1.2.1.

Page 4 of 8

Releases

Has known vulnerabilities

Previous Next

Gptqmodel

Page 4 of 8

1.4.0

1.3.1

1.3.0

1.2.3

1.2.1

1.2.0

Page 4 of 8

Links

Releases