Gptqmodel

Latest version: v2.2.0

Safety actively analyzes 724004 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 8

0.9.9

What's Changed

Added Llama-3.1 support, Gemma2 27B quant inference support via vLLM, auto pad_token normalization, fixed auto-round quant compat for vLLM/SGLang.


* [CI] by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/238, https://github.com/ModelCloud/GPTQModel/pull/236, https://github.com/ModelCloud/GPTQModel/pull/237, https://github.com/ModelCloud/GPTQModel/pull/241, https://github.com/ModelCloud/GPTQModel/pull/242, https://github.com/ModelCloud/GPTQModel/pull/243, https://github.com/ModelCloud/GPTQModel/pull/246, https://github.com/ModelCloud/GPTQModel/pull/247, https://github.com/ModelCloud/GPTQModel/pull/250
* [FIX] explicitly call torch.no_grad() by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/239
* Bitblas update by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/249
* [FIX] calib avg for calib dataset arg passed as tensors by Qubitium, LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/254, https://github.com/ModelCloud/GPTQModel/pull/258
* [MODEL] gemma2 27b can load with vLLM now by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/257
* [OPTIMIZE] to optimize vllm inference, set an environment variable 'VLLM_ATTENTI… by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/260
* [FIX] hard set batch_size to 1 for 4.43.0 transformer due to compat/regression by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/279
* FIX vllm llama 3.1 support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/280
* Use better defaults values for quantization config by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/281
* [REFRACTOR] Cleanup backend and model_type usage by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/276
* [FIX] allow auto_round lm_head quantization by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/282
* [FIX] [MODEL] Llama-3.1-8B-Instruct's eos_token_id is a list by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/284
* [FIX] add release_vllm_model, and import destroy_model_parallel in release_vllm_model by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/288
* [FIX] autoround quants compat with vllm/sglang by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/287

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.8...v0.9.9

0.9.8

What's Changed

1. Marlin end-to-end in/out feature padding for max model support
2. Run quantized models (`FORMAT.GPTQ`) directly using fast vLLM backend!
3. Run quantized models (`FORMAT.GPTQ`) directly using fast SGLang backend!

* πŸš€ πŸš€ [CORE] Marlin end-to-end in/out feature padding by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/183 https://github.com/ModelCloud/GPTQModel/pull/192
* πŸš€ πŸš€ [CORE] Add vLLM Backend for FORMAT.GPTQ by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/190
* πŸš€ πŸš€ [CORE] Add SGLang Backend by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/191
* πŸš€ [CORE] Use Triton v2 to pack gptq/gptqv2 formats by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/202
* ✨ [CLEANUP] remove triton warmup by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/200
* πŸ‘Ύ [FIX] 8bit choosing wrong packer by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/199
* ✨ [CI] [CLEANUP] Improve Unit Tests by CSY, PSY, and ZYC
* ✨ [DOC] Consolidate Examples by ZYC in https://github.com/ModelCloud/GPTQModel/pull/225


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.7...v0.9.8

0.9.7

What's Changed
* πŸš€ [MODEL] InternLM 2.5 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/182


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.6...v0.9.7

0.9.6

What's Changed

[Intel/AutoRound](https://github.com/intel/auto-round) QUANT_METHOD support added for a potentially higher quality quantization with `lm_head` module quantization support for even more vram reduction: format export to `FORMAT.GPTQ` for max inference compatibility.

* πŸš€ [CORE] Add AutoRound as Quantizer option by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/166
* πŸ‘Ύ [FIX] [CI] Update test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/177
* πŸ‘Ύ Cleanup Triton by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/178


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.5...v0.9.6

0.9.5

What's Changed

Another large update with added support for Intel/Qbits quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of better performing Exllama (v1/v2), Marlin, and Triton kernels.

* πŸš€πŸš€ [KERNEL] Added Intel QBits support with [2, 3, 4, 8] bits quantization/inference on CPU by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/137
* ✨ [CORE] BaseQuantLinear add SUPPORTED_DEVICES by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/174
* ✨ [DEPRECATION] Remove Backend.CUDA and Backend.CUDA_OLD by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/165
* πŸ‘Ύ [CI] FIX test perplexity by ZYC-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/160


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.4...v0.9.5

0.9.4

What's Changed
* πŸš€ [FEATURE] Added Transformers Integration via monkeypatch by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/147
* πŸ‘Ύ [FIX] Typo causing Gemma 2 errors by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/158


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.3...v0.9.4

Page 7 of 8

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.