Gptqmodel

Latest version: v1.4.1

Safety actively analyzes 688961 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 5

0.9.10

What's Changed

Ported vllm/nm gptq_marlin inference kernel with expanded bits (8bits), group_size (64,32), and desc_act support for all GPTQ models with `format = FORMAT.GPTQ`. Auto calculate auto-round nsamples/seglen parameters based on calibration dataset. Fixed `save_quantized()` called on pre-quantized models with non-supported backends. HF transformers depend updated to ensure Llama 3.1 fixes are correctly applied to both quant and inference stage.

* [CORE] add marlin inference kernel by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/310
* [CI] Increase timeout to 40m by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/295, https://github.com/ModelCloud/GPTQModel/pull/299
* [FIX] save_quantized() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/296
* [FIX] autoround nsample/seqlen to be actual size of calibration_dataset. by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/297, LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/298
* Update HF transformers to 4.43.3 by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/305
* [CI] remove test_marlin_hf_cache_serialization() by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/314

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.9...v0.9.10

0.9.9

What's Changed

Added Llama-3.1 support, Gemma2 27B quant inference support via vLLM, auto pad_token normalization, fixed auto-round quant compat for vLLM/SGLang.


* [CI] by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/238, https://github.com/ModelCloud/GPTQModel/pull/236, https://github.com/ModelCloud/GPTQModel/pull/237, https://github.com/ModelCloud/GPTQModel/pull/241, https://github.com/ModelCloud/GPTQModel/pull/242, https://github.com/ModelCloud/GPTQModel/pull/243, https://github.com/ModelCloud/GPTQModel/pull/246, https://github.com/ModelCloud/GPTQModel/pull/247, https://github.com/ModelCloud/GPTQModel/pull/250
* [FIX] explicitly call torch.no_grad() by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/239
* Bitblas update by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/249
* [FIX] calib avg for calib dataset arg passed as tensors by Qubitium, LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/254, https://github.com/ModelCloud/GPTQModel/pull/258
* [MODEL] gemma2 27b can load with vLLM now by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/257
* [OPTIMIZE] to optimize vllm inference, set an environment variable 'VLLM_ATTENTI… by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/260
* [FIX] hard set batch_size to 1 for 4.43.0 transformer due to compat/regression by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/279
* FIX vllm llama 3.1 support by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/280
* Use better defaults values for quantization config by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/281
* [REFRACTOR] Cleanup backend and model_type usage by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/276
* [FIX] allow auto_round lm_head quantization by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/282
* [FIX] [MODEL] Llama-3.1-8B-Instruct's eos_token_id is a list by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/284
* [FIX] add release_vllm_model, and import destroy_model_parallel in release_vllm_model by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/288
* [FIX] autoround quants compat with vllm/sglang by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/287

**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.8...v0.9.9

0.9.8

What's Changed

1. Marlin end-to-end in/out feature padding for max model support
2. Run quantized models (`FORMAT.GPTQ`) directly using fast vLLM backend!
3. Run quantized models (`FORMAT.GPTQ`) directly using fast SGLang backend!

* πŸš€ πŸš€ [CORE] Marlin end-to-end in/out feature padding by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/183 https://github.com/ModelCloud/GPTQModel/pull/192
* πŸš€ πŸš€ [CORE] Add vLLM Backend for FORMAT.GPTQ by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/190
* πŸš€ πŸš€ [CORE] Add SGLang Backend by PZS-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/191
* πŸš€ [CORE] Use Triton v2 to pack gptq/gptqv2 formats by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/202
* ✨ [CLEANUP] remove triton warmup by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/200
* πŸ‘Ύ [FIX] 8bit choosing wrong packer by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/199
* ✨ [CI] [CLEANUP] Improve Unit Tests by CSY, PSY, and ZYC
* ✨ [DOC] Consolidate Examples by ZYC in https://github.com/ModelCloud/GPTQModel/pull/225


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.7...v0.9.8

0.9.7

What's Changed
* πŸš€ [MODEL] InternLM 2.5 support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/182


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.6...v0.9.7

0.9.6

What's Changed

[Intel/AutoRound](https://github.com/intel/auto-round) QUANT_METHOD support added for a potentially higher quality quantization with `lm_head` module quantization support for even more vram reduction: format export to `FORMAT.GPTQ` for max inference compatibility.

* πŸš€ [CORE] Add AutoRound as Quantizer option by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/166
* πŸ‘Ύ [FIX] [CI] Update test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/177
* πŸ‘Ύ Cleanup Triton by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/178


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.5...v0.9.6

0.9.5

What's Changed

Another large update with added support for Intel/Qbits quantization/inference on CPU. Cuda kernels have been fully deprecated in favor of better performing Exllama (v1/v2), Marlin, and Triton kernels.

* πŸš€πŸš€ [KERNEL] Added Intel QBits support with [2, 3, 4, 8] bits quantization/inference on CPU by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/137
* ✨ [CORE] BaseQuantLinear add SUPPORTED_DEVICES by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/174
* ✨ [DEPRECATION] Remove Backend.CUDA and Backend.CUDA_OLD by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/165
* πŸ‘Ύ [CI] FIX test perplexity by ZYC-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/160


**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v0.9.4...v0.9.5

Page 4 of 5

Β© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.