Overview
- CUDA kernels improvement: support models whose hidden_size can only divisible by 32/64 instead of 256.
- Peft integration: support training and inference using LoRA, AdaLoRA, AdaptionPrompt, etc.
- New models: BaiChuan, InternLM.
- Other updates: see 'Full Change Log' below for details.
Full Change Log
What's Changed
* Pytorch qlinear by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/116
* Specify UTF-8 encoding for README.md in setup.py by EliEron in https://github.com/PanQiWei/AutoGPTQ/pull/132
* Support cuda 64dim by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/126
* Support 32dim by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/125
* Peft integration by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/102
* Support setting inject_fused_attention and inject_fused_mlp to False by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/134
* Add transpose operator when replace Conv1d with qlinear_cuda_old by geekinglcq in https://github.com/PanQiWei/AutoGPTQ/pull/140
* Add support for BaiChuan model by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/164
* Fix error message by AngainorDev in https://github.com/PanQiWei/AutoGPTQ/pull/141
* Add support for InternLM by cczhong11 in https://github.com/PanQiWei/AutoGPTQ/pull/189
* Fix stale documentation by MarisaKirisame in https://github.com/PanQiWei/AutoGPTQ/pull/158
New Contributors
* EliEron made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/132
* geekinglcq made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/140
* AngainorDev made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/141
* cczhong11 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/189
* MarisaKirisame made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/158
**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.2.1...v0.3.0