Auto-gptq

Latest version: v0.7.1

Safety actively analyzes 724051 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 3

0.2.0

*Happy International Children's Day! 🎈 At the age of LLMs and the dawn of AGI, may we always be curious like children, with vigorous energy and courage to explore the bright future.*

Features Summary

There are bunch of new features been added in this version:
- Optimized modules for faster inference speed: fused attention for `llama` and `gptj`, fused mlp for `llama`
- Full CPU offloading
- Multiple GPUs inference with triton backend
- Three new models are supported: `codegen`, `gpt_bigcode` and `falcon`
- Support download/upload quantized model from/to HF Hub

Change Log

Below are the detailed change log:

* Fix bug cuda by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/44
* Fix bug caused by 'groupsize' vs 'group_size' and change all code to use 'group_size' consistently by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/58
* Setup conda by Sciumo in https://github.com/PanQiWei/AutoGPTQ/pull/59
* fix incorrect pack while using cuda, desc_act and grouping by lszxb in https://github.com/PanQiWei/AutoGPTQ/pull/62
* Faster llama by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/43
* Gptj fused attention by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/76
* Look for .pt files by oobabooga in https://github.com/PanQiWei/AutoGPTQ/pull/79
* Support users customize `device_map` by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/80
* Update example script to include desc_act by Ph0rk0z in https://github.com/PanQiWei/AutoGPTQ/pull/82
* Forward position args to allow `model(tokens)` syntax by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/84
* Rename 'quant_cuda' to 'autogptq_cuda' to avoid conflicts with existing GPTQ-for-LLaMa installations. by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/93
* fix ImportError when triton is not installed by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/92
* Fix CUDA out of memory error in qlinear_old.py by LexSong in https://github.com/PanQiWei/AutoGPTQ/pull/66
* Improve CPU offload by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/100
* triton float32 support by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/104
* Add support for CodeGen/2 by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/65
* Add support for GPTBigCode(starcoder) by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/63
* Minor syntax fix for auto.py by billcai in https://github.com/PanQiWei/AutoGPTQ/pull/112
* Falcon support by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/111
* Add support for HF Hub download, and `push_to_hub` by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/91
* Add build wheels workflow by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/120

New Contributors

Following are new contributors and their first pr. Thank you very much for your love of `auto_gptq` and contributions! ❤️

* Sciumo made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/59
* lszxb made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/62
* oobabooga made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/79
* Ph0rk0z made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/82
* LexSong made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/66
* LaaZa made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/65
* billcai made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/112

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.1.0...v0.2.0

0.1.0

What's Changed
* add option by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/23
* Add gpt2 by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/30
* Fix bug speedup quant and support gpt2 by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/29
* Offloading and Multiple devices quantization/inference by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/24
* Add raise exception and gpt2 xl example add by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/31
* Allow to load arbitrary models by z80maniac in https://github.com/PanQiWei/AutoGPTQ/pull/33
* Change save name by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/34
* Fix typo: 'hole' -> 'whole' by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/40
* bug fix quantization demo by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/37
* Check that model_save_name exists before trying to load it, to avoid confusing checkpoint error by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/39
* Faster cuda no actorder by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/38

New Contributors
* z80maniac made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/33
* TheBloke made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/40

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.5...v0.1.0

0.0.5

What's Changed
* add simple demo ppl test with wikitext2 by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/17
* push_to_hub integration by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/18

New Contributors
* qwopqwop200 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/17

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.4...v0.0.5

0.0.4

Big News
- `triton` is officially supported start from this version!
- quick install from pypi using `pip install auto-gptq` is supported start from this version!

What's Changed
* Support MOSS model by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/15
* Triton integration by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/16

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.0.3...v0.0.4

0.0.3

What's Changed
- fix typo in README.md
- fix problem that can't get some models' max sequence length
- fix problem that some models have more required positional arguments when forward in transformer layers
- fix mismatch GPTNeoxForCausalLM's lm_head

New Contributors
* eltociear made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/10

0.0.2

- added `eval_tasks` module to support evaluate model's performance on predefined down-stream tasks before and after quantization
- fixed some bugs when using `LLaMa` model
- fixed some bugs when using models that required `position_ids`

Page 3 of 3

Releases

Has known vulnerabilities

Auto-gptq

Page 3 of 3

0.2.0

0.1.0

0.0.5

0.0.4

0.0.3

0.0.2

Page 3 of 3

Links

Releases