*Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.*
Features Summary
There are bunch of new features been added in this version:
- Optimized modules for faster inference speed: fused attention for `llama` and `gptj`, fused mlp for `llama`
- Full CPU offloading
- Multiple GPUs inference with triton backend
- Three new models are supported: `codegen`, `gpt_bigcode` and `falcon`
- Support download/upload quantized model from/to HF Hub
Change Log
Below are the detailed change log:
* Fix bug cuda by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/44
* Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/58
* Setup conda by Sciumo in https://github.com/PanQiWei/AutoGPTQ/pull/59
* fix incorrect pack while using cuda, desc_act and grouping by lszxb in https://github.com/PanQiWei/AutoGPTQ/pull/62
* Faster llama by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/43
* Gptj fused attention by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/76
* Look for .pt files by oobabooga in https://github.com/PanQiWei/AutoGPTQ/pull/79
* Support users customize `device_map` by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/80
* Update example script to include desc_act by Ph0rk0z in https://github.com/PanQiWei/AutoGPTQ/pull/82
* Forward position args to allow `model(tokens)` syntax by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/84
* Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/93
* fix ImportError when triton is not installed by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/92
* Fix CUDA out of memory error in qlinear_old.py by LexSong in https://github.com/PanQiWei/AutoGPTQ/pull/66
* Improve CPU offload by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/100
* triton float32 support by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/104
* Add support for CodeGen/2 by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/65
* Add support for GPTBigCode(starcoder) by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/63
* Minor syntax fix for auto.py by billcai in https://github.com/PanQiWei/AutoGPTQ/pull/112
* Falcon support by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/111
* Add support for HF Hub download, and `push_to_hub` by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/91
* Add build wheels workflow by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/120
New Contributors
Following are new contributors and their first pr. Thank you very much for your love of `auto_gptq` and contributions! ❤️
* Sciumo made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/59
* lszxb made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/62
* oobabooga made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/79
* Ph0rk0z made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/82
* LexSong made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/66
* LaaZa made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/65
* billcai made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/112
**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.1.0...v0.2.0