Gptqmodel

Latest version: v2.2.0

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 8

1.5.1

What's Changed

🎉 2025!

⚡ Added `QuantizeConfig.device` to clearly define which device is used for quantization: default = `auto`. Non-quantized models are always loaded on cpu by-default and each layer is moved to `QuantizeConfig.device` during quantization to minimize vram usage.
💫 Improve `QuantLinear` selection from `optimum`.
🐛 Fix `attn_implementation_autoset` compat in latest transformers.

* Add QuantizeConfig.device and use. by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/950
* fix hf_select_quant_linear by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/966
* update vllm gptq_marlin code by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/967
* fix cuda:0 not a enum device by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/968
* fix marlin info for non-cuda device by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/972
* fix backend str bug by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/973
* hf select quant_linear with pack by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/969
* remove auto select BACKEND.IPEX by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/975
* fix autoround received a device_map by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/976
* use enum instead of magic number by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/979
* use new ci docker images by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/980
* fix flash attntion was auto loaded on cpu for pretrained model by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/981
* fix old transformer doesn't have _attn_implementation_autoset by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/982
* fix gptbigcode test temporally by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/983
* fix version parsing by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/985

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.5.0...v1.5.1

1.5.0

What's Changed

⚡ Multi-modal (image-to-text) optimized quantization support has been added for Qwen 2-VL and Ovis 1.6-VL. Previous image-to-text model quantizations did not use image calibration data, resulting in less than optimal post-quantization results. Version 1.5.0 is the first release to provide a stable path for multi-modal quantization: only text layers are quantized.
🐛 Fixed Qwen 2-VL model quantization vram usage and post-quant file copy of relevant config files.
🐛 Fixed install/compilations in envs with wrong TORCH_CUDA_ARCH_LIST set (Nvidia docker images)
🐛 Warn about bad torch[cuda] install on Windows

* Fix backend not ipex by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/930
* Fix broken ipex check by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/933
* Fix dynamic_cuda validation by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/936
* Fix bdist_wheel does not exist on old setuptools by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/939
* Add cuda warning on windows by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/942
* Add torch inference benchmark by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/940
* Add `modality` to `BaseModel` by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/937
* [FIX] qwen_vl_utils should be locally import by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/946
* Filter torch cuda arch < 6.0 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/955
* [FIX] wrong filepath was used when model_id_or_path was hugging model id by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/956
* Fix import error was not caught by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/961

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.5...v1.5.0

1.4.5

What's Changed

⚡ Windows 11 support added/validated with `DynamicCuda` and `Torch` kernels.
⚡ Ovis 1.6 VL model support with image data calibration.
⚡ Reduced quantization vram usage.
🐛 Fixed `dynamic` controlled layer loading logic

* Refractor by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/895
* Add platform check by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/899
* Exclude marlin & exllama on windows by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/898
* Remove unnecessary backslash in the expression & typehint by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/903
* Add DEVICE.ALL by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/901
* [FIX] the error of loading quantized model with dynamic by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/907
* [FIX] gpt2 quantize error by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/912
* Simplify checking generated str for vllm test & fix transformers version for cohere2 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/914
* [MODEL] add OVIS support by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/685
* Fix IDE warning marlin not in __all__ by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/920

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.4...v1.4.5

1.4.4

What's Changed

⚡ Reduced memory usage during quantization
⚡ Fix `device_map={"":"auto"}` compat

* Speed up unit tests by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/885
* [FIX] hf select quant linear parse device map by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/887
* Avoid cloning on gpu by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/886
* Expose hf_quantize() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/888
* Update integration hf code by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/891
* Add back fasterquant() for compat by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/892

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.2...v1.4.4

1.4.2

What's Changed

⚡ MacOS `gpu` (MPS) + `cpu` inference and quantization support
⚡ Added Cohere 2 model support

* Build Changes by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/855
* Fix MacOS support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/861
* check device_map on from_quantized() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/865
* call patch for TestTransformersIntegration by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/867
* Add MacOS gpu acceleration via MPS by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/864
* [MODEL] add cohere2 support by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/869
* check device_map by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/872
* set PYTORCH_ENABLE_MPS_FALLBACK for macos by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/873
* check device_map int value by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/876
* Simplify by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/877
* [FIX] device_map={"":None} by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/878
* set torch_dtype to float16 for XPU by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/875
* remove IPEX device check by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/879
* [FIX] call normalize_device() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/881
* [FIX] get_best_device() wrong usage by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/882

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.1...v1.4.2

1.4.1

What's Changed

⚡ Added Qwen2-VL model support.
⚡ `mse` quantization control exposed in QuantizeConfig
⚡ New `GPTQModel.patch_hf()` and `GPTQModel.patch_vllm()` monkey patch api to allow Transformers/Optimum/Peft to use GPTQModel while upstream PRs are pending.
⚡ New `GPTQModel.patch_vllm()` monkey patch api to allow `vLLM` to correctly load `dynamic`/mixed gptq quantized models.

* Add warning for vllm/sglang when using dynamic feature by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/810
* Update Eval() usage sample by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/819
* auto select best device by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/822
* Fix error msg by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/823
* allow pass meta_quantizer from save() by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/824
* Quantconfig add mse field by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/825
* [MODEL] add qwen2_vl support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/826
* check cuda when there's only cuda device by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/830
* Update lm-eval test by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/831
* add patch_vllm() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/829
* Monkey patch HF transformer/optimum/peft support by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/818
* auto patch vllm by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/837
* Fix lm-eval API BUG by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/838
* [FIX] dynamic get "desc_act" error by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/841
* BaseModel add supports_desc_act by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/842
* [FIX] should local import patch_vllm() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/844
* Mod vllm generate by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/833
* fix patch_vllm by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/850

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.0...v1.4.1

Page 3 of 8

Releases

Has known vulnerabilities

Previous Next

Gptqmodel

Page 3 of 8

1.5.1

1.5.0

1.4.5

1.4.4

1.4.2

1.4.1

Page 3 of 8

Links

Releases