Auto-gptq

Latest version: v0.7.1

Safety actively analyzes 723929 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 3

0.4.1

Overview
- Fix typo so not only `pytorch==2.0.0` but also `pytorch>=2.0.0` can be used for llama fused attention.
- Patch exllama QuantLinear to avoid modifying the state dict to make the integration with transformers smoother.

Change Log

What's Changed
* Patch exllama QuantLinear to avoid modifying the state dict by fxmarty in https://github.com/PanQiWei/AutoGPTQ/pull/243

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.4.0...v0.4.1

0.4.0

Overview

- New platform: support ROCm platform (5.4.2 for now, and will extend to 5.5 and 5.6 as soon as pytorch officially release 2.1.0).
- New kernels: support exllama q4 kernels to get at least 1.3x inference speedup.
- New quantization strategy: support to specify `static_groups=True` on quantization which can futher improve quantized model's performance and close the gap of PPL again un-quantized model.
- New model: [qwen](https://huggingface.co/Qwen/Qwen-7B)

Full Change Log

What's Changed
* Add RoCm support by fxmarty in https://github.com/PanQiWei/AutoGPTQ/pull/214
* Fix revision used to load the quantization config by fxmarty in https://github.com/PanQiWei/AutoGPTQ/pull/220
* [General Quant Linear] Register quant params of general quant linear for friendly post process. by LeiWang1999 in https://github.com/PanQiWei/AutoGPTQ/pull/226
* Add exllama q4 kernel by fxmarty in https://github.com/PanQiWei/AutoGPTQ/pull/219
* Suppprt static groups and fix bug by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/236
* support qwen by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/240

New Contributors
* fxmarty made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/214
* LeiWang1999 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/226

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.3.2...v0.4.0

0.3.2

Overview

- Fix CUDA kernel bug that cause `desc_act` and `group_size` can't be used together
- Improve user experience of manually installation
- Improve user experience of loading quantized model
- Add `perplexity_utils.py` to gracefully calculate PPL so that the result can be used to compare with other libraries fairly
- Remove `save_dir` argument from `from_quantized` model, and now only `model_name_or_path` argument is supported in this method

Full Change Log

What's Changed
* Fix cuda bug by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/202
* Fix `revision` and other huggingface_hub kwargs in .from_quantized() by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/205
* Change the install script so it attempts to build the CUDA extension in all cases by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/206
* Add a central version number by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/207
* Add Safetensors metadata saving, with some values saved to each .safetensor file by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/208
* [FEATURE] Implement perplexity metric to compare against llama.cpp by casperbh96 in https://github.com/PanQiWei/AutoGPTQ/pull/166
* Fix error raised when CUDA kernels are not installed by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/209
* Fix build on non-CUDA machines after 206 by casperbh96 in https://github.com/PanQiWei/AutoGPTQ/pull/212

New Contributors
* casperbh96 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/166

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.3.0...v0.3.2

0.3.0

Overview

- CUDA kernels improvement: support models whose hidden_size can only divisible by 32/64 instead of 256.
- Peft integration: support training and inference using LoRA, AdaLoRA, AdaptionPrompt, etc.
- New models: BaiChuan, InternLM.
- Other updates: see 'Full Change Log' below for details.

Full Change Log

What's Changed
* Pytorch qlinear by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/116
* Specify UTF-8 encoding for README.md in setup.py by EliEron in https://github.com/PanQiWei/AutoGPTQ/pull/132
* Support cuda 64dim by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/126
* Support 32dim by qwopqwop200 in https://github.com/PanQiWei/AutoGPTQ/pull/125
* Peft integration by PanQiWei in https://github.com/PanQiWei/AutoGPTQ/pull/102
* Support setting inject_fused_attention and inject_fused_mlp to False by TheBloke in https://github.com/PanQiWei/AutoGPTQ/pull/134
* Add transpose operator when replace Conv1d with qlinear_cuda_old by geekinglcq in https://github.com/PanQiWei/AutoGPTQ/pull/140
* Add support for BaiChuan model by LaaZa in https://github.com/PanQiWei/AutoGPTQ/pull/164
* Fix error message by AngainorDev in https://github.com/PanQiWei/AutoGPTQ/pull/141
* Add support for InternLM by cczhong11 in https://github.com/PanQiWei/AutoGPTQ/pull/189
* Fix stale documentation by MarisaKirisame in https://github.com/PanQiWei/AutoGPTQ/pull/158

New Contributors
* EliEron made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/132
* geekinglcq made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/140
* AngainorDev made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/141
* cczhong11 made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/189
* MarisaKirisame made their first contribution in https://github.com/PanQiWei/AutoGPTQ/pull/158

**Full Changelog**: https://github.com/PanQiWei/AutoGPTQ/compare/v0.2.1...v0.3.0

0.2.2

- fix `autogptq_cuda` dir missed in distribution file

0.2.1

Fix the problem that installation from pypi failed when the environment variable `CUDA_VERSION` is set.

Page 2 of 3

Releases

Has known vulnerabilities

Previous Next

Auto-gptq

Page 2 of 3

0.4.1

0.4.0

0.3.2

0.3.0

0.2.2

0.2.1

Page 2 of 3

Links

Releases