What's Changed
⚡Effective BPW (bits per weight) will now be logged during load().
⚡Reduce loading time on Intel Arc A770/B580 XPU by 3.3x.
⚡Reduce memory usage in MLX conversion.
🐛 Fix Marlin kernel auto-select not checking CUDA compute version.
* remove catching module error by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1088
* [FIX] monkey patch GPTQShuffle.convert_idx to use fixed convert_idx by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1090
* [FIX] monkey patch only once by LRL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1091
* check CC >= 8 for marlin, fixed 1092 by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1093
* check compute capability for marlin in validate_device() by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1095
* torch get device with index of CUDA_VISIBLE_DEVICES, not value of it by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1096
* fix local model path & marlin test by CSY-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1097
* mod bits info by CL-ModelCloud in https://github.com/ModelCloud/GPTQModel/pull/1100
* Reduce memory usage in mlx conversion by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1099
* cleanup mlx code by Qubitium in https://github.com/ModelCloud/GPTQModel/pull/1101
**Full Changelog**: https://github.com/ModelCloud/GPTQModel/compare/v1.7.0...v1.7.2