What's Changed
Added partial quantization support Llama 3.2 Vision model. v1.0.5 allows quantization of text-layers (layers responsible for text-generation) only. We will add vision layer support shortly. A Llama 3.2 11B Vision Instruct models will quantize to 50% of the size in 4bit mode. Once vision layer support is added, the size will reduce to expected ~1/4.
* [MODEL] Add Llama 3.2 Vision (mllama)* support by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/401
**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.0.4...v1.0.5