Trl

Latest version: v0.12.2

Safety actively analyzes 688238 Python packages for vulnerabilities to keep your Python projects secure.

Page 7 of 8

0.4.7

Patch release: `SFTTrainer` and `PPOTrainer` bug fixes

What's Changed

* Make shuffle optional by lopez-hector in https://github.com/lvwerra/trl/pull/457
* Pre-commit by vwxyzjn in https://github.com/lvwerra/trl/pull/448
* Debug the tortuous logic in `_prepare_dataset` function by BeibinLi in https://github.com/lvwerra/trl/pull/464
* [`CI`] Fix CI RM by younesbelkada in https://github.com/lvwerra/trl/pull/468
* Update sft_trainer.py by JulesGM in https://github.com/lvwerra/trl/pull/474
* Refactor README by younesbelkada in https://github.com/lvwerra/trl/pull/460
* add ratio threshold to avoid spikes by lvwerra in https://github.com/lvwerra/trl/pull/488
* fix typo in reward_modeling.py by csyourui in https://github.com/lvwerra/trl/pull/494
* FIX: contributing guidelines command by BramVanroy in https://github.com/lvwerra/trl/pull/493
* Remove padding in batched generation. by lvwerra in https://github.com/lvwerra/trl/pull/487
* Adds some options to stabilize the KL penalty by edbeeching in https://github.com/lvwerra/trl/pull/486
* correctly implement gradient checkpointing to multi-adapter example by mnoukhov in https://github.com/lvwerra/trl/pull/479
* Disable mlm by default in DataCollatorForCompletionOnlyLM, add ignore_index and docstring by BramVanroy in https://github.com/lvwerra/trl/pull/476
* Use `float` instead of `double` to avoid issues with MPS device by younesbelkada in https://github.com/lvwerra/trl/pull/499
* [`PPOTrainer`] Add prefix tuning support by younesbelkada in https://github.com/lvwerra/trl/pull/501
* [`PPOTrainer`] Add prompt tuning support on TRL by younesbelkada in https://github.com/lvwerra/trl/pull/500
* [`SFTTrainer`] Fix the sequence length check of `SFTTrainer` by younesbelkada in https://github.com/lvwerra/trl/pull/512

New Contributors

* lopez-hector made their first contribution in https://github.com/lvwerra/trl/pull/457
* BeibinLi made their first contribution in https://github.com/lvwerra/trl/pull/464
* csyourui made their first contribution in https://github.com/lvwerra/trl/pull/494
* BramVanroy made their first contribution in https://github.com/lvwerra/trl/pull/493

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.6...v0.4.7

0.4.6

Patch release

Patch release to fix a bug on google colab with PPOTrainer & PPOConfig + wandb

What's Changed

* Fix google colab issue by younesbelkada in https://github.com/lvwerra/trl/pull/459

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.5...v0.4.6

0.4.5

Patch release 1 - `SFTTrainer` enhancements and fixes

This patch release adds multiple fixes for the SFTTrainer and enhancements. Another patch release is coming for fixing an issue with PPOTrainer and Google Colab combined with wandb logging

What's Changed

* Add slurm utility by vwxyzjn in https://github.com/lvwerra/trl/pull/412
* Enable autotag feature w/ wandb by vwxyzjn in https://github.com/lvwerra/trl/pull/411
* [doc build] Use secrets by mishig25 in https://github.com/lvwerra/trl/pull/420
* Update test_reward_trainer.py by younesbelkada in https://github.com/lvwerra/trl/pull/421
* best-of-n sampler class by metric-space in https://github.com/lvwerra/trl/pull/375
* handle the offline case by younesbelkada in https://github.com/lvwerra/trl/pull/431
* Fix correct gradient accumulation by younesbelkada in https://github.com/lvwerra/trl/pull/407
* Drop support for Python 3.7 by younesbelkada in https://github.com/lvwerra/trl/pull/441
* [`SFTTrainer`] Relax dataset constraints by younesbelkada in https://github.com/lvwerra/trl/pull/442
* [`SFTTrainer`] Fix non packed dataset by younesbelkada in https://github.com/lvwerra/trl/pull/444
* [`core`] Add stale bot by younesbelkada in https://github.com/lvwerra/trl/pull/447
* [`SFTTrainer`] Introducing `DataCollatorForCompletionOnlyLM` by younesbelkada in https://github.com/lvwerra/trl/pull/445
* [`ConstantLengthDataset`] Fix packed dataset issue by younesbelkada in https://github.com/lvwerra/trl/pull/452
* Update accelerate arg passthrourgh for tensorboard logging to reflect logging_dir deprecation. by jganitkevitch in https://github.com/lvwerra/trl/pull/437
* Multi adapter RL (MARL) - a single model for RM & Value Head by younesbelkada in https://github.com/lvwerra/trl/pull/373

New Contributors

* jganitkevitch made their first contribution in https://github.com/lvwerra/trl/pull/437

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.4...v0.4.5

0.4.4

Patch release

* [`core`] unpin accelerate by younesbelkada in https://github.com/lvwerra/trl/pull/418

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.3...v0.4.4

0.4.3

Patch release - pin accelerate version

* Skip flaky test until next transformers release by younesbelkada in https://github.com/lvwerra/trl/pull/410
* Pin accelerate version by younesbelkada in https://github.com/lvwerra/trl/pull/414

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.2...v0.4.3

0.4.2

QLoRA RLHF, SFT Trainer and RewardTrainer

A new version of TRL that includes training larger models using QLoRA (4 bit quantization through bitsandbytes), brand new classes `RewardTrainer` and `SFTTrainer` to easily conduct your RLHF projects end-to-end!

Introducing `SFTTrainer` and `RewardTrainer`

Use the brand new trainer to easily train your reward model and supervised fine-tuned (SFT) model with few lines of code!

* [`core`] officially support SFT (Supervised Finetuning) by younesbelkada in https://github.com/lvwerra/trl/pull/323
* [`SFT`] Fix sft issues by younesbelkada in https://github.com/lvwerra/trl/pull/336
* [`docs`] fix SFT doc by younesbelkada in https://github.com/lvwerra/trl/pull/367
* [`core`] Officially Support Reward Modeling by younesbelkada in https://github.com/lvwerra/trl/pull/303
* Resolve broken evaluation/prediction for RewardTrainer by tomaarsen in https://github.com/lvwerra/trl/pull/404

QLoRA integration

Pass 4bit models directly into `PPOTrainer` for more memory efficient training

* [`core`] Add 4bit QLora by younesbelkada in https://github.com/lvwerra/trl/pull/383
* [`bnb`] fix 4 bit SFT by younesbelkada in https://github.com/lvwerra/trl/pull/396

Updated StackLlama example

Great work by mnoukhov that managed to fix the issues related with StackLlama and the new versions of `accelerate`, `peft` and `transformers`. The completely reproducible examples below:

* StackLLaMA: correctly merge peft model by mnoukhov in https://github.com/lvwerra/trl/pull/398
* StackLlama: fixed RL training and added args by mnoukhov in https://github.com/lvwerra/trl/pull/400
* Fixed some type annotations of trl.trainer.PPoTrainer by JulesGM in https://github.com/lvwerra/trl/pull/392
* StackLLaMA: fix supervised finetuning and reward model training by mnoukhov in https://github.com/lvwerra/trl/pull/399

Bug fixes and improvements

* [`core`] refactor peft API by younesbelkada in https://github.com/lvwerra/trl/pull/231
* Batched generation by lvwerra in https://github.com/lvwerra/trl/pull/228
* Reduce memory consumption in batched_forward_pass by ohashi56225 in https://github.com/lvwerra/trl/pull/234
* [`core`] Add warning when negative KL by younesbelkada in https://github.com/lvwerra/trl/pull/239
* adds early stopping by edbeeching in https://github.com/lvwerra/trl/pull/238
* PPO config __init__ is bloated by GauravVirmani in https://github.com/lvwerra/trl/pull/241
* feat(ci): enable `pip` cache by SauravMaheshkar in https://github.com/lvwerra/trl/pull/198
* Improve logging for PPO + Docs page by natolambert in https://github.com/lvwerra/trl/pull/243
* Fix typo by heya5 in https://github.com/lvwerra/trl/pull/253
* Using batched generate in sentiment scripts by GauravVirmani in https://github.com/lvwerra/trl/pull/249
* [`core`] Fix DeepSpeed zero-3 issue by younesbelkada in https://github.com/lvwerra/trl/pull/182
* [`distributed`] Fix early stopping and DP by younesbelkada in https://github.com/lvwerra/trl/pull/254
* [`core`] Fix ds issue by younesbelkada in https://github.com/lvwerra/trl/pull/260
* Add LlaMa in tests + `create_reference_model` by younesbelkada in https://github.com/lvwerra/trl/pull/261
* Use active model to generate response in example on README (269) by rmill040 in https://github.com/lvwerra/trl/pull/271
* stack-llama by edbeeching in https://github.com/lvwerra/trl/pull/273
* Adding pointer back to Meta's LLaMA. by meg-huggingface in https://github.com/lvwerra/trl/pull/277
* fix doc string problem in ppo trainer loss function by thuwyh in https://github.com/lvwerra/trl/pull/279
* Add LLaMA tutorial to docs by natolambert in https://github.com/lvwerra/trl/pull/278
* Fix swapped helper texts by philipp-classen in https://github.com/lvwerra/trl/pull/284
* fix typo in gpt2-sentiment.ipynb by eltociear in https://github.com/lvwerra/trl/pull/293
* add functionality to push best models to the hub during training by Bearnardd in https://github.com/lvwerra/trl/pull/275
* Small improvements / fixes to toxicity example by natolambert in https://github.com/lvwerra/trl/pull/266
* Fix arguments description by lvzii in https://github.com/lvwerra/trl/pull/298
* [`t5`] Fix negative kl issue by younesbelkada in https://github.com/lvwerra/trl/pull/262
* Log Token distribution of Query / Response by natolambert in https://github.com/lvwerra/trl/pull/295
* clean examples folder by natolambert in https://github.com/lvwerra/trl/pull/294
* fixed typo in error message by soerenarlt in https://github.com/lvwerra/trl/pull/312
* fix DS for peft ref_model in ppo trainer by halfrot in https://github.com/lvwerra/trl/pull/309
* [`CI`] Fix broken tests by younesbelkada in https://github.com/lvwerra/trl/pull/318
* [`Docs`] Add details on multi-GPU / multi-node by younesbelkada in https://github.com/lvwerra/trl/pull/320
* Give a key to the wandb PPOConfig config entry by JulesGM in https://github.com/lvwerra/trl/pull/315
* added doc for using torch.distributed.launch/run by oroojlooy in https://github.com/lvwerra/trl/pull/324
* Fix argument's description by vinhkhuc in https://github.com/lvwerra/trl/pull/339
* stack_llama: update instructions in README, fix broken _get_submodules and save tokenizer by teticio in https://github.com/lvwerra/trl/pull/358
* stack_llama: add parameter to control max_length (to mitigate OOM errors) by teticio in https://github.com/lvwerra/trl/pull/359
* [`PPO`] Relax negative KL constraint by younesbelkada in https://github.com/lvwerra/trl/pull/352
* [`PPOTrainer`] Fix tensorboard issue by younesbelkada in https://github.com/lvwerra/trl/pull/330
* 140/best n sampling by metric-space in https://github.com/lvwerra/trl/pull/326
* Fix bug when loading local peft model by Opdoop in https://github.com/lvwerra/trl/pull/342
* add is_trainable in kwargs by Opdoop in https://github.com/lvwerra/trl/pull/363
* Remove obsolete layer_norm_names parameter and add peft>=0.3.0 to requirements by teticio in https://github.com/lvwerra/trl/pull/366
* Delete test_training.py by younesbelkada in https://github.com/lvwerra/trl/pull/371
* [`core`] Fix warning issue by younesbelkada in https://github.com/lvwerra/trl/pull/377
* Update customization.mdx by binganao in https://github.com/lvwerra/trl/pull/390
* fix dataloader typo in ppo_trainer.py by LZY-the-boys in https://github.com/lvwerra/trl/pull/389
* from_pretrain with peft adapter on the hub ( 379) by glerzing in https://github.com/lvwerra/trl/pull/380
* keep state_dict kwargs instead of popping it in save_pretrained by rizar in https://github.com/lvwerra/trl/pull/393
* Remove unused imports in docs. by vwxyzjn in https://github.com/lvwerra/trl/pull/406

New Contributors

* ohashi56225 made their first contribution in https://github.com/lvwerra/trl/pull/234
* GauravVirmani made their first contribution in https://github.com/lvwerra/trl/pull/241
* SauravMaheshkar made their first contribution in https://github.com/lvwerra/trl/pull/198
* heya5 made their first contribution in https://github.com/lvwerra/trl/pull/253
* rmill040 made their first contribution in https://github.com/lvwerra/trl/pull/271
* thuwyh made their first contribution in https://github.com/lvwerra/trl/pull/279
* philipp-classen made their first contribution in https://github.com/lvwerra/trl/pull/284
* Bearnardd made their first contribution in https://github.com/lvwerra/trl/pull/275
* lvzii made their first contribution in https://github.com/lvwerra/trl/pull/298
* soerenarlt made their first contribution in https://github.com/lvwerra/trl/pull/312
* halfrot made their first contribution in https://github.com/lvwerra/trl/pull/309
* oroojlooy made their first contribution in https://github.com/lvwerra/trl/pull/324
* vinhkhuc made their first contribution in https://github.com/lvwerra/trl/pull/339
* teticio made their first contribution in https://github.com/lvwerra/trl/pull/358
* metric-space made their first contribution in https://github.com/lvwerra/trl/pull/326
* Opdoop made their first contribution in https://github.com/lvwerra/trl/pull/342
* binganao made their first contribution in https://github.com/lvwerra/trl/pull/390
* LZY-the-boys made their first contribution in https://github.com/lvwerra/trl/pull/389
* glerzing made their first contribution in https://github.com/lvwerra/trl/pull/380
* rizar made their first contribution in https://github.com/lvwerra/trl/pull/393
* mnoukhov made their first contribution in https://github.com/lvwerra/trl/pull/398
* tomaarsen made their first contribution in https://github.com/lvwerra/trl/pull/404
* vwxyzjn made their first contribution in https://github.com/lvwerra/trl/pull/406

**Full Changelog**: https://github.com/lvwerra/trl/compare/v0.4.1...v0.4.2

Page 7 of 8

Releases

Has known vulnerabilities

Previous Next

Trl

Page 7 of 8

0.4.7

0.4.6

0.4.5

0.4.4

0.4.3

0.4.2

Page 7 of 8

Links

Releases