Highlights
![image](https://github.com/huggingface/peft/assets/49240599/8274f36f-246f-4509-a6e4-804aba574566)
Support for QLoRA with DeepSpeed ZeRO3 and FSDP
We added a couple of changes to allow QLoRA to work with DeepSpeed ZeRO3 and Fully Sharded Data Parallel (FSDP). For instance, this allows you to fine-tune a 70B Llama model on two GPUs with 24GB memory each. Besides the latest version of PEFT, this requires `bitsandbytes>=0.43.0`, `accelerate>=0.28.0`, `transformers>4.38.2`, `trl>0.7.11`. Check out our docs on [DeepSpeed](https://huggingface.co/docs/peft/v0.10.0/en/accelerate/deepspeed) and [FSDP](https://huggingface.co/docs/peft/v0.10.0/en/accelerate/fsdp) with PEFT, as well as this [blogpost](https://www.answer.ai/posts/2024-03-06-fsdp-qlora.html) from answer.ai, for more details.
Layer replication
First time contributor siddartha-RE added support for layer replication with LoRA. This allows you to duplicate layers of a model and apply LoRA adapters to them. Since the base weights are shared, this costs only very little extra memory, but can lead to a nice improvement of model performance. Find out more in [our docs](https://huggingface.co/docs/peft/v0.10.0/en/developer_guides/lora#memory-efficient-layer-replication-with-lora).
Improving DoRA
Last release, we added the option to enable [DoRA](https://arxiv.org/abs/2402.09353) in PEFT by simply adding `use_dora=True` to your `LoraConfig`. However, this only worked for non-quantized linear layers. With this PEFT release, we now also support `Conv2d` layers, as well as linear layers quantized with bitsandbytes.
Mixed LoRA adapter batches
If you have a PEFT model with multiple LoRA adapters attached to it, it's now possible to apply different adapters (or, in fact, no adapter) on different samples in the same batch. To do this, pass a list of adapter names as an additional argument. For example, if you have a batch of three samples:
python
output = model(**inputs, adapter_names=["adapter1", "adapter2", "__base__"])`
Here, `"adapter1"` and `"adapter2"` should be the same name as your corresponding LoRA adapters and `"__base__"` is a special name that refers to the base model without any adapter. Find more details in [our docs](https://huggingface.co/docs/peft/v0.10.0/en/developer_guides/lora#inference-with-different-lora-adapters-in-the-same-batch).
Without this feature, if you wanted to run inference with different LoRA adapters, you'd have to use single samples or try to group batches with the same adapter, then switch between adapters using `set_adapter` -- this is inefficient and inconvenient. Therefore, it is recommended to use this new, faster method from now on when encountering this scenario.
New LoftQ initialization function
We added an alternative way to initialize LoRA weights for a quantized model using the LoftQ method, which can be more convenient than the existing method. Right now, using LoftQ requires you to go through multiple steps as shown [here](https://github.com/huggingface/peft/blob/8e979fc73248ccb4c5b5a99c415f3e14a37daae6/examples/loftq_finetuning/README.md). Furthermore, it's necessary to keep a separate copy of the quantized weights, as those are not identical to the quantized weights from the default model.
Using the new `replace_lora_weights_loftq` function, it's now possible to apply LoftQ initialization in a single step and without the need for extra copies of the weights. Check out [the docs](https://huggingface.co/docs/peft/v0.10.0/en/developer_guides/lora#a-more-convienient-way) and this [example notebook](https://github.com/huggingface/peft/blob/main/examples/loftq_finetuning/LoftQ_weight_replacement.ipynb) to see how it works. Right now, this method only supports 4bit quantization with bitsandbytes, and the model has to be stored in the safetensors format.
Deprecations
The function `prepare_model_for_int8_training` was deprecated for quite some time and is now removed completely. Use `prepare_model_for_kbit_training` instead.
What's Changed
Besides these highlights, we added many small improvements and fixed a couple of bugs. All these changes are listed below. As always, we thank all the awesome contributors who helped us improve PEFT.
* Bump version to 0.9.1.dev0 by BenjaminBossan in https://github.com/huggingface/peft/pull/1517
* Fix for "leaf Variable that requires grad" Error in In-Place Operation by DopeorNope-Lee in https://github.com/huggingface/peft/pull/1372
* FIX [`CI` / `Docker`] Follow up from 1481 by younesbelkada in https://github.com/huggingface/peft/pull/1487
* CI: temporary disable workflow by younesbelkada in https://github.com/huggingface/peft/pull/1534
* FIX [`Docs`/ `bnb` / `DeepSpeed`] Add clarification on bnb + PEFT + DS compatibilities by younesbelkada in https://github.com/huggingface/peft/pull/1529
* Expose bias attribute on tuner layers by BenjaminBossan in https://github.com/huggingface/peft/pull/1530
* docs: highlight difference between `num_parameters()` and `get_nb_trainable_parameters()` in PEFT by kmehant in https://github.com/huggingface/peft/pull/1531
* fix: fail when required args not passed when `prompt_tuning_init==TEXT` by kmehant in https://github.com/huggingface/peft/pull/1519
* Fixed minor grammatical and code bugs by gremlin97 in https://github.com/huggingface/peft/pull/1542
* Optimize `levenshtein_distance` algorithm in `peft_lora_seq2seq_accelera…` by SUNGOD3 in https://github.com/huggingface/peft/pull/1527
* Update `prompt_based_methods.md` by insist93 in https://github.com/huggingface/peft/pull/1548
* FIX Allow AdaLoRA rank to be 0 by BenjaminBossan in https://github.com/huggingface/peft/pull/1540
* FIX: Make adaptation prompt CI happy for transformers 4.39.0 by younesbelkada in https://github.com/huggingface/peft/pull/1551
* MNT: Use `BitsAndBytesConfig` as `load_in_*` is deprecated by BenjaminBossan in https://github.com/huggingface/peft/pull/1552
* Add Support for Mistral Model in Llama-Adapter Method by PrakharSaxena24 in https://github.com/huggingface/peft/pull/1433
* Add support for layer replication in LoRA by siddartha-RE in https://github.com/huggingface/peft/pull/1368
* QDoRA: Support DoRA with BnB quantization by BenjaminBossan in https://github.com/huggingface/peft/pull/1518
* Feat: add support for Conv2D DoRA by sayakpaul in https://github.com/huggingface/peft/pull/1516
* TST Report slowest tests by BenjaminBossan in https://github.com/huggingface/peft/pull/1556
* Changes to support fsdp+qlora and dsz3+qlora by pacman100 in https://github.com/huggingface/peft/pull/1550
* Update style with ruff 0.2.2 by BenjaminBossan in https://github.com/huggingface/peft/pull/1565
* FEAT Mixing different LoRA adapters in same batch by BenjaminBossan in https://github.com/huggingface/peft/pull/1558
* FIX [`CI`] Fix test docker CI by younesbelkada in https://github.com/huggingface/peft/pull/1535
* Fix LoftQ docs and tests by BenjaminBossan in https://github.com/huggingface/peft/pull/1532
* More convenient way to initialize LoftQ by BenjaminBossan in https://github.com/huggingface/peft/pull/1543
New Contributors
* DopeorNope-Lee made their first contribution in https://github.com/huggingface/peft/pull/1372
* kmehant made their first contribution in https://github.com/huggingface/peft/pull/1531
* gremlin97 made their first contribution in https://github.com/huggingface/peft/pull/1542
* SUNGOD3 made their first contribution in https://github.com/huggingface/peft/pull/1527
* insist93 made their first contribution in https://github.com/huggingface/peft/pull/1548
* PrakharSaxena24 made their first contribution in https://github.com/huggingface/peft/pull/1433
* siddartha-RE made their first contribution in https://github.com/huggingface/peft/pull/1368
**Full Changelog**: https://github.com/huggingface/peft/compare/v0.9.0...v0.10.0