Key Features and Improvements
- **GPTQ Quantized-weight Sequential Updating** ([177](https://github.com/vllm-project/llm-compressor/pull/177)): Introduced an efficient sequential updating mechanism for GPTQ quantization, improving model compression performance and compatibility.
- **Auto-Infer Mappings for SmoothQuantModifier** ([119](https://github.com/vllm-project/llm-compressor/pull/119)): Automatically infers `mappings` based on model architecture, making SmoothQuant easier to apply across various models.
- **Improved Sparse Compression Usability** ([191](https://github.com/vllm-project/llm-compressor/pull/191)): Added support for targeted sparse compression with specific ignore rules during inference, allowing for more flexible model configurations.
- **Generic Wrapper for Any Hugging Face Model** ([185](https://github.com/vllm-project/llm-compressor/pull/185)): Added `wrap_hf_model_class` utility, enabling better support and integration for Hugging Face models i.e. not based on `AutoModelForCausalLM`.
- **Observer Restructure** ([837](https://github.com/vllm-project/llm-compressor/pull/837)): Introduced calibration and frozen steps within `QuantizationModifier`, moving Observers from compressed-tensors to llm-compressor.
Bug Fixes
- **Fix Tied Tensors Bug** ([659](https://github.com/vllm-project/llm-compressor/pull/659))
- **Observer Initialization in GPTQ Wrapper** ([883](https://github.com/vllm-project/llm-compressor/pull/883))
- **Sparsity Reload Testing** ([882](https://github.com/vllm-project/llm-compressor/pull/882))
Documentation
- **Updated SmoothQuant Tutorial** ([115](https://github.com/vllm-project/llm-compressor/pull/115)): Expanded SmoothQuant documentation to include detailed mappings for easier implementation.
What's Changed
* Fix compresed typo by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/188
* GPTQ Quantized-weight Sequential Updating by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/177
* Add: targets and ignore inference for sparse compression by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/191
* switch tests from weekly to nightly by dhuangnm in https://github.com/vllm-project/llm-compressor/pull/658
* Compression wrapper abstract methods by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/170
* Explicitly set sequential_update in examples by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/187
* Increase Sparsity Threshold for compressors by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/679
* Add a generic `wrap_hf_model_class` utility to support VLMs by mgoin in https://github.com/vllm-project/llm-compressor/pull/185
* Add tests for examples by dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/149
* Rename to quantization config by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/730
* Implement Missing Modifier Methods by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/166
* Fix 2/4 GPTQ Model Tests by dsikka in https://github.com/vllm-project/llm-compressor/pull/769
* SmoothQuant mappings tutorial by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/115
* Fix import of `ModelCompressor` by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/776
* update test by dsikka in https://github.com/vllm-project/llm-compressor/pull/773
* [Bugfix] Fix saving offloaded state dict by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/172
* Auto-Infer `mappings` Argument for `SmoothQuantModifier` Based on Model Architecture by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/119
* Update workflows/actions by dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/774
* [Bugfix] Prepare KD Models when Saving by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/174
* Set Sparse compression to save_compressed by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/821
* Install compressed-tensors after llm-compressor by dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/825
* Fix test typo by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/828
* Add `AutoModelForCausalLM` example by dsikka in https://github.com/vllm-project/llm-compressor/pull/698
* [Bugfix] Workaround tied tensors bug by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/659
* Only untie word embeddings by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/839
* Check for config hidden size by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/840
* Use float32 for Hessian dtype by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/847
* GPTQ: Depreciate non-sequential update option by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/762
* Typehint nits by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/826
* [ DOC ] Remove version restrictions in W8A8 exmaple by miaojinc in https://github.com/vllm-project/llm-compressor/pull/849
* Fix inconsistence in example config of 2:4 sparse quantization by yzlnew in https://github.com/vllm-project/llm-compressor/pull/80
* Fix forward function pass call by dsikka in https://github.com/vllm-project/llm-compressor/pull/845
* [Bugfix] Use weight parameter of linear layer by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/836
* [Bugfix] Rename files to remove colons by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/846
* cover all 3.9-3.12 in commit testing by dhuangnm in https://github.com/vllm-project/llm-compressor/pull/864
* Add marlin-24 recipe/configs for e2e testing by dsikka in https://github.com/vllm-project/llm-compressor/pull/866
* [Bugfix] onload during sparsity calculation by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/862
* Fix HFTrainer overloads by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/869
* Support Model Offloading Tied Tensors Patch by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/872
* Add advice about dealing with non-invertable hessians by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/875
* seed commit workflow by andy-neuma in https://github.com/vllm-project/llm-compressor/pull/877
* [Observer Restructure]: Add Observers; Add `calibration` and `frozen` steps to `QuantizationModifier` by dsikka in https://github.com/vllm-project/llm-compressor/pull/837
* Bugfix observer initialization in `gptq_wrapper` by rahul-tuli in https://github.com/vllm-project/llm-compressor/pull/883
* BugFix: Fix Sparsity Reload Testing by dsikka in https://github.com/vllm-project/llm-compressor/pull/882
* Use custom unique test names for e2e tests by dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/892
* Revert "Use custom unique test names for e2e tests (892)" by dsikka in https://github.com/vllm-project/llm-compressor/pull/893
* Move config["testconfig_path"] assignment by dbarbuzzi in https://github.com/vllm-project/llm-compressor/pull/895
* Cap accelerate version to avoid bug by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/897
* Fix observing offloaded weight by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/896
* Update image in README.md by mgoin in https://github.com/vllm-project/llm-compressor/pull/861
* update accelerate version by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/899
* [GPTQ] Iterative Parameter Updating by kylesayrs in https://github.com/vllm-project/llm-compressor/pull/863
* Small fixes for release by dsikka in https://github.com/vllm-project/llm-compressor/pull/901
* use smaller portion of dataset by dsikka in https://github.com/vllm-project/llm-compressor/pull/902
* Update example to not fail hessian inversion by dsikka in https://github.com/vllm-project/llm-compressor/pull/904
* Bump version to 0.3.0 by dsikka in https://github.com/vllm-project/llm-compressor/pull/907
New Contributors
* miaojinc made their first contribution in https://github.com/vllm-project/llm-compressor/pull/849
* yzlnew made their first contribution in https://github.com/vllm-project/llm-compressor/pull/80
* andy-neuma made their first contribution in https://github.com/vllm-project/llm-compressor/pull/877
**Full Changelog**: https://github.com/vllm-project/llm-compressor/compare/0.2.0...0.3.0