`IterativeTrainer`, NEFTune and major bugfixes for `DPOTrainer` and Distributed Training
In this release we introduce two new features, `IterativeTrainer` from gaetanlop and NEFTune, together with important bugfixes for distributed training.
IterativeTrainer
Iterative fine-tuning is a training method that enables to perform custom actions (generation and filtering for example) between optimization steps. In TRL we provide an easy-to-use API to fine-tune your models in an iterative way in just a few lines of code.
Read more about it here: https://huggingface.co/docs/trl/iterative_sft_trainer
* Introducing the Iterative Trainer by gaetanlop in https://github.com/huggingface/trl/pull/737
NEFTune
NEFTune is a technique to boost the performance of chat models and was introduced by the paper [“NEFTune: Noisy Embeddings Improve Instruction Finetuning”](https://arxiv.org/abs/2310.05914) from Jain et al. it consists of adding noise to the embedding vectors during training. According to the abstract of the paper:
* [`SFTTrainer`] Adds NEFTune into `SFTTrainer` by younesbelkada in https://github.com/huggingface/trl/pull/871
* [`NEFTune`] Make use of forward hooks instead by younesbelkada in https://github.com/huggingface/trl/pull/889
* Generalize NEFTune for FSDP, DDP, ... by younesbelkada in https://github.com/huggingface/trl/pull/924
* [`NEFTune`] Make use of forward hooks instead by younesbelkada in https://github.com/huggingface/trl/pull/889
Read more about it [here](https://huggingface.co/docs/trl/sft_trainer#enhance-models-performances-using-neftune)
Major bugfixes
Major bugfixes have been addressed to tackle many issues with distributed training and gradient checkpointing.
* [`DPO`] fix DPO + GC issues by younesbelkada in https://github.com/huggingface/trl/pull/927
* [`core` / `DDP`] Fix RM trainer + DDP + quantization + propagate `gradient_checkpointing_kwargs` in SFT & DPO by younesbelkada in https://github.com/huggingface/trl/pull/912
DPOTrainer enhancements and fixes
The DPOTrainer now comes with multiple enhancements and bugfixes! Check them out below
* [DPO] add SLiC hinge loss to DPOTrainer by kashif in https://github.com/huggingface/trl/pull/866
* Fix DPOTrainer + PEFT by younesbelkada in https://github.com/huggingface/trl/pull/941
* [DPO] Merge initial peft model if trainer has a peft_config by kashif in https://github.com/huggingface/trl/pull/956
* Adds model kwargs to SFT and DPO trainers by edbeeching in https://github.com/huggingface/trl/pull/951
* fix: dpo trainer ds config by mengban in https://github.com/huggingface/trl/pull/957
* hotfix for dpo trainer by mnoukhov in https://github.com/huggingface/trl/pull/919
* Fix dpo_llama2.py by younesbelkada in https://github.com/huggingface/trl/pull/934
What's Changed