What's Changed
⚡ Multi-modal (image-to-text) optimized quantization support has been added for Qwen 2-VL and Ovis 1.6-VL. Previous image-to-text model quantizations did not use image calibration data, resulting in less than optimal post-quantization results. Version 1.5.0 is the first release to provide a stable path for multi-modal quantization: only text layers are quantized.
🐛 Fixed Qwen 2-VL model quantization vram usage and post-quant file copy of relevant config files.
🐛 Fixed install/compilations in envs with wrong TORCH_CUDA_ARCH_LIST set (Nvidia docker images)
🐛 Warn about bad torch[cuda] install on Windows
* Fix backend not ipex by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/930
* Fix broken ipex check by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/933
* Fix dynamic_cuda validation by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/936
* Fix bdist_wheel does not exist on old setuptools by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/939
* Add cuda warning on windows by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/942
* Add torch inference benchmark by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/940
* Add `modality` to `BaseModel` by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/937
* [FIX] qwen_vl_utils should be locally import by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/946
* Filter torch cuda arch < 6.0 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/955
* [FIX] wrong filepath was used when model_id_or_path was hugging model id by ZX-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/956
* Fix import error was not caught by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/961
**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.4.5...v1.5.0