QLoRA Support:
QLoRA uses 4-bit quantization to compress a pretrained language model. The LM parameters are then frozen and a relatively small number of trainable parameters are added to the model in the form of Low-Rank Adapters. During finetuning, QLoRA backpropagates gradients through the frozen 4-bit quantized pretrained language model into the Low-Rank Adapters. The LoRA layers are the only parameters being updated during training. For more details read the blog [Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA
](https://huggingface.co/blog/4bit-transformers-bitsandbytes)
* 4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) by TimDettmers in https://github.com/huggingface/peft/pull/476
* [`core`] Protect 4bit import by younesbelkada in https://github.com/huggingface/peft/pull/480
* [`core`] Raise warning on using `prepare_model_for_int8_training` by younesbelkada in https://github.com/huggingface/peft/pull/483
New PEFT methods: IA3 from T-Few paper
To make fine-tuning more efficient, IA3 (Infused Adapter by Inhibiting and Amplifying Inner Activations) rescales inner activations with learned vectors. These learned vectors are injected into the attention and feedforward modules in a typical transformer-based architecture. These learned vectors are the only trainable parameters during fine-tuning, and thus the original weights remain frozen. Dealing with learned vectors (as opposed to learned low-rank updates to a weight matrix like LoRA) keeps the number of trainable parameters much smaller. For more details, read the paper [Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning](https://arxiv.org/abs/2205.05638)
* Add functionality to support IA3 by SumanthRH in https://github.com/huggingface/peft/pull/578
Support for new tasks: QA and Feature Extraction
Addition of `PeftModelForQuestionAnswering` and `PeftModelForFeatureExtraction` classes to support QA and Feature Extraction tasks, respectively. This enables exciting new use-cases with PEFT, e.g., [LoRA for semantic similarity tasks](https://huggingface.co/docs/peft/task_guides/semantic-similarity-lora).
* feat: Add PeftModelForQuestionAnswering by sjrl in https://github.com/huggingface/peft/pull/473
* add support for Feature Extraction using PEFT by pacman100 in https://github.com/huggingface/peft/pull/647
AutoPeftModelForxxx for better and Simplified UX
Introduces a new paradigm, AutoPeftModelForxxx intended for users that want to rapidly load and run peft models.
from peft import AutoPeftModelForCausalLM
peft_model = AutoPeftModelForCausalLM.from_pretrained("ybelkada/opt-350m-lora")
* Introducing `AutoPeftModelForxxx` by younesbelkada in https://github.com/huggingface/peft/pull/694
LoRA for custom models
Not a transformer model, no problem, we have got you covered. PEFT now enables the usage of LoRA with custom models.
* FEAT: Make LoRA work with custom models by BenjaminBossan in https://github.com/huggingface/peft/pull/676
New LoRA utilities
Improvements to `add_weighted_adapter` method to support SVD for combining multiple LoRAs when creating new LoRA.
New utils such as `unload` and `delete_adapter` providing users much better control about how they deal with the adapters.
* [Core] Enhancements and refactoring of LoRA method by pacman100 in https://github.com/huggingface/peft/pull/695
PEFT and Stable Diffusion
PEFT is very extensible and easy to use for performing DreamBooth of Stable Diffusion. Community has added conversion scripts to be able to use PEFT models with Civitai/webui format and vice-versa.
* LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT by kovalexal in https://github.com/huggingface/peft/pull/461
* Added Civitai LoRAs conversion to PEFT, PEFT LoRAs conversion to webui by kovalexal in https://github.com/huggingface/peft/pull/596
* [Bugfix] Fixed LoRA conv2d merge by kovalexal in https://github.com/huggingface/peft/pull/637
* Fixed LoraConfig alpha modification on add_weighted_adapter by kovalexal in https://github.com/huggingface/peft/pull/654
What's Changed