Trl

Latest version: v0.12.2

Safety actively analyzes 688238 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 8

0.7.9

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.7.8...v0.7.9

0.7.8

Unsloth tag for `xxxTrainer`

If users use Unsloth library, the `unsloth` tag gets automatically pushed on the Hub.

* [`xxxTrainer`] Add unsloth tag by younesbelkada in https://github.com/huggingface/trl/pull/1130

DPO fixes

Some important fixes for DPO has been introduced to address: https://twitter.com/jon_durbin/status/1743575483365699809 and to make DPO faster

* Allow separate devices for target/ref models. by jondurbin in https://github.com/huggingface/trl/pull/1190
* Allow swapping PEFT adapters for target/ref model. by jondurbin in https://github.com/huggingface/trl/pull/1193
* Change device access order for speedup of calculating metrics in DPOTrainer by brcps12 in https://github.com/huggingface/trl/pull/1154

DDPO + PEFT

Now DDPO supports PEFT

* add: support for `peft` in ddpo. by sayakpaul in https://github.com/huggingface/trl/pull/1165

Other fixes

* add peft_module_casting_to_bf16 in DPOTrainer by sywangyi in https://github.com/huggingface/trl/pull/1143
* SFT Tokenizer Fix by ChrisCates in https://github.com/huggingface/trl/pull/1142
* Minor fixes to some comments in some examples. by mattholl in https://github.com/huggingface/trl/pull/1156
* Correct shapes in docstring of PPOTrainer's train_minibatch method by nikihowe in https://github.com/huggingface/trl/pull/1170
* Update sft_trainer.py by Hemanthkumar2112 in https://github.com/huggingface/trl/pull/1162
* Fix batch all gather by vwxyzjn in https://github.com/huggingface/trl/pull/1177
* Address issue 1122 by maneandrea in https://github.com/huggingface/trl/pull/1174
* Fix misleading variable "epoch" from the training loop from PPOTrainer Doc. by Jfhseh in https://github.com/huggingface/trl/pull/1171
* SFTTrainer: follow args.remove_unused_columns by mgerstgrasser in https://github.com/huggingface/trl/pull/1188
* Handle last token from generation prompt by pablovicente in https://github.com/huggingface/trl/pull/1153

New Contributors
* ChrisCates made their first contribution in https://github.com/huggingface/trl/pull/1142
* brcps12 made their first contribution in https://github.com/huggingface/trl/pull/1154
* mattholl made their first contribution in https://github.com/huggingface/trl/pull/1156
* sayakpaul made their first contribution in https://github.com/huggingface/trl/pull/1165
* nikihowe made their first contribution in https://github.com/huggingface/trl/pull/1170
* Hemanthkumar2112 made their first contribution in https://github.com/huggingface/trl/pull/1162
* maneandrea made their first contribution in https://github.com/huggingface/trl/pull/1174
* Jfhseh made their first contribution in https://github.com/huggingface/trl/pull/1171
* mgerstgrasser made their first contribution in https://github.com/huggingface/trl/pull/1188
* pablovicente made their first contribution in https://github.com/huggingface/trl/pull/1153
* jondurbin made their first contribution in https://github.com/huggingface/trl/pull/1190

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.7.7...v0.7.8

0.7.7

A fix has been introduce to fix a breaking change with `PPOTrainer.push_to_hub()` and `DDPOTrainer.push_to_hub()`

* [`PPOTrainer` / `DDPOTrainer`] Fix ppo & ddpo push to Hub by younesbelkada in https://github.com/huggingface/trl/pull/1141

What's Changed

0.7.6

Patch release: Multi-tag instead of single tags for `xxxTrainer`

This is a patch release to push multiple tags (e.g. `trl` & `sft`) instead of one tag

What's Changed

0.7.5

IPO & KTO & cDPO loss, `DPOTrainer` enhancements, automatic tags for `xxxTrainer`

Important enhancements for `DPOTrainer`

This release introduces many new features in TRL for `DPOTrainer`:

- IPO-loss for a better generalization of DPO algorithm
- KTO & cDPO loss
- You can also pass pre-computed logits to `DPOTrainer`

* [DPO] Refactor eval logging of dpo trainer by mnoukhov in https://github.com/huggingface/trl/pull/954
* Fixes reward and text gathering in distributed training by edbeeching in https://github.com/huggingface/trl/pull/850
* remove spurious optimize_cuda_cache deprecation warning on init by ChanderG in https://github.com/huggingface/trl/pull/1045
* Revert "[DPO] Refactor eval logging of dpo trainer (954)" by lvwerra in https://github.com/huggingface/trl/pull/1047
* Fix DPOTrainer + PEFT 2 by rdk31 in https://github.com/huggingface/trl/pull/1049
* [DPO] IPO Training loss by kashif in https://github.com/huggingface/trl/pull/1022
* [DPO] cDPO loss by kashif in https://github.com/huggingface/trl/pull/1035
* [DPO] use ref model logprobs if it exists in the data by kashif in https://github.com/huggingface/trl/pull/885
* [DP0] save eval_dataset for subsequent calls by kashif in https://github.com/huggingface/trl/pull/1125
* [DPO] rename kto loss by kashif in https://github.com/huggingface/trl/pull/1127
* [DPO] add KTO loss by kashif in https://github.com/huggingface/trl/pull/1075

Automatic `xxxTrainer` tagging on the Hub

Now, trainers from TRL pushes automatically tags `trl-sft`, `trl-dpo`, `trl-ddpo` when pushing models on the Hub

* [`xxxTrainer`] Add tags to all trainers in TRL by younesbelkada in https://github.com/huggingface/trl/pull/1120

unsloth 🤝 TRL

We encourage users to try out [unsloth library](https://github.com/unslothai/unsloth) for faster LLM fine-tuning using PEFT & TRL's SFTTrainer and DPOTrainer

* [`Docs`] Add unsloth optimizations in TRL's documentation by younesbelkada in https://github.com/huggingface/trl/pull/1119

What's Changed

* set dev version by younesbelkada in https://github.com/huggingface/trl/pull/970
* [`Tests`] Add non optional packages tests by younesbelkada in https://github.com/huggingface/trl/pull/974
* [DOCS] Fix outdated references to `examples/` by alvarobartt in https://github.com/huggingface/trl/pull/977
* Update README.md by GeekDream-x in https://github.com/huggingface/trl/pull/994
* [DataCollatorForCompletionOnlyLM] Warn on identical `eos_token_id` and `pad_token_id` by MustSave in https://github.com/huggingface/trl/pull/988
* [`DataCollatorForCompletionOnlyLM`] Add more clarification / guidance in the case `tokenizer.pad_token_id == tokenizer.eos_token_id` by younesbelkada in https://github.com/huggingface/trl/pull/992
* make distributed true for multiple process by allanj in https://github.com/huggingface/trl/pull/997
* Fixed wrong trigger for warning by zabealbe in https://github.com/huggingface/trl/pull/971
* Update how_to_train.md by halfrot in https://github.com/huggingface/trl/pull/1003
* Adds `requires_grad` to input for non-quantized peft models by younesbelkada in https://github.com/huggingface/trl/pull/1006
* [Multi-Adapter PPO] Fix and Refactor reward model adapter by mnoukhov in https://github.com/huggingface/trl/pull/982
* Remove duplicate data loading in rl_training.py by viethoangtranduong in https://github.com/huggingface/trl/pull/1020
* [Document] Minor fixes of sft_trainer document by mutichung in https://github.com/huggingface/trl/pull/1029
* Update utils.py by ZihanWang314 in https://github.com/huggingface/trl/pull/1012
* spelling is hard by grahamannett in https://github.com/huggingface/trl/pull/1043
* Fixing accelerator version function call. by ParthaEth in https://github.com/huggingface/trl/pull/1056
* [SFT Trainer] precompute packed iterable into a dataset by lvwerra in https://github.com/huggingface/trl/pull/979
* Update doc CI by lewtun in https://github.com/huggingface/trl/pull/1060
* Improve PreTrainedModelWrapper._get_current_device by billvsme in https://github.com/huggingface/trl/pull/1048
* Update doc for the computer_metrics argument of SFTTrainer by albertauyeung in https://github.com/huggingface/trl/pull/1062
* [`core`] Fix failing tests on main by younesbelkada in https://github.com/huggingface/trl/pull/1065
* [`SFTTrainer`] Fix Trainer when args is None by younesbelkada in https://github.com/huggingface/trl/pull/1064
* enable multiple eval datasets by peter-sk in https://github.com/huggingface/trl/pull/1052
* Add missing `loss_type` in `ValueError` message by alvarobartt in https://github.com/huggingface/trl/pull/1067
* Add args to SFT example by lewtun in https://github.com/huggingface/trl/pull/1079
* add local folder support as input for rl_training. by sywangyi in https://github.com/huggingface/trl/pull/1078
* Make CI happy by younesbelkada in https://github.com/huggingface/trl/pull/1080
* Removing `tyro` in `sft_llama2.py` by vwxyzjn in https://github.com/huggingface/trl/pull/1081
* Log arg consistency by tcapelle in https://github.com/huggingface/trl/pull/1084
* Updated documentation for docs/source/reward_trainer.mdx to import th… by cm2435 in https://github.com/huggingface/trl/pull/1092
* [Feature] Add Ascend NPU accelerator support by statelesshz in https://github.com/huggingface/trl/pull/1096
* `peft_module_casting_to_bf16` util method, `append_concat_token` flag, remove callback `PeftSavingCallback` by pacman100 in https://github.com/huggingface/trl/pull/1110
* Make prepending of bos token configurable. by pacman100 in https://github.com/huggingface/trl/pull/1114
* fix gradient checkpointing when using PEFT by pacman100 in https://github.com/huggingface/trl/pull/1118
* Update `description` in `setup.py` by alvarobartt in https://github.com/huggingface/trl/pull/1101

New Contributors

* alvarobartt made their first contribution in https://github.com/huggingface/trl/pull/977
* GeekDream-x made their first contribution in https://github.com/huggingface/trl/pull/994
* MustSave made their first contribution in https://github.com/huggingface/trl/pull/988
* allanj made their first contribution in https://github.com/huggingface/trl/pull/997
* zabealbe made their first contribution in https://github.com/huggingface/trl/pull/971
* viethoangtranduong made their first contribution in https://github.com/huggingface/trl/pull/1020
* mutichung made their first contribution in https://github.com/huggingface/trl/pull/1029
* ZihanWang314 made their first contribution in https://github.com/huggingface/trl/pull/1012
* grahamannett made their first contribution in https://github.com/huggingface/trl/pull/1043
* ChanderG made their first contribution in https://github.com/huggingface/trl/pull/1045
* rdk31 made their first contribution in https://github.com/huggingface/trl/pull/1049
* ParthaEth made their first contribution in https://github.com/huggingface/trl/pull/1056
* billvsme made their first contribution in https://github.com/huggingface/trl/pull/1048
* albertauyeung made their first contribution in https://github.com/huggingface/trl/pull/1062
* peter-sk made their first contribution in https://github.com/huggingface/trl/pull/1052
* sywangyi made their first contribution in https://github.com/huggingface/trl/pull/1078
* tcapelle made their first contribution in https://github.com/huggingface/trl/pull/1084
* cm2435 made their first contribution in https://github.com/huggingface/trl/pull/1092
* statelesshz made their first contribution in https://github.com/huggingface/trl/pull/1096
* pacman100 made their first contribution in https://github.com/huggingface/trl/pull/1110

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.7.4...v0.7.5

0.7.4

Patch Release

This release is a patch release that addresses an issue for users that have TRL installed without PEFT

What's Changed

Page 5 of 8

Releases

Has known vulnerabilities

Previous Next

Trl

Page 5 of 8

0.7.9

0.7.8

0.7.7

0.7.6

0.7.5

0.7.4

Page 5 of 8

Links

Releases