Trl

Latest version: v0.16.0

Safety actively analyzes 724051 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 10

2306.13649

> Sequence-Level KD (Kim & Rush, 2016). SeqKD maximizes the likelihood of high probability sequences generated by the teacher, and can be viewed as supervised FT on teacher-generated outputs.

SeqKD is taken as a baseline in the paper. It is now possible to use Sequence-Level KD in the `GKDTrainer` by setting `seq_kd=True` in the `GKDConfig`.

python
training_args = GKDConfig(..., seq_kd=True)


by mst272 in https://github.com/huggingface/trl/pull/2220

Default `dataset_text_field` to `"text"`

Since many users use `"text"` as the column name for textual data in datasets, we've made it the default (previously a required argument) in `SFTConfig`. Now, specifying `dataset_text_field="text"` is no longer necessary.

diff
SFTConfig(
...,
- dataset_text_field="text",
)


by qgallouedec in https://github.com/huggingface/trl/pull/2078

What's Changed

* [SFT] fix neftune_noise_alpha in SFTTrainer by kashif in https://github.com/huggingface/trl/pull/1841
* Standardize `training_args` by qgallouedec in https://github.com/huggingface/trl/pull/2082
* Fix typo in ORPO example. by skandermoalla in https://github.com/huggingface/trl/pull/2092
* Fix Inconsistency with IsShardedQLoRA Setting by fabianlim in https://github.com/huggingface/trl/pull/2089
* Fixes 2087 - _process_tokens for empty prompts in KTOTrainer by gabikadlecova in https://github.com/huggingface/trl/pull/2093
* KTO: fix logits metric, add logits metric to BCOTrainer by claralp in https://github.com/huggingface/trl/pull/2094
* Clean up README and remove openrlbenchmark dependency by lewtun in https://github.com/huggingface/trl/pull/2085
* Fix PPO/RLOO examples by lewtun in https://github.com/huggingface/trl/pull/2100
* [CLI] `trl env` for printing system info by qgallouedec in https://github.com/huggingface/trl/pull/2104
* [RewardTrainer] Tokenize inputs within trainer by lewtun in https://github.com/huggingface/trl/pull/2102
* Fix documentation links by qgallouedec in https://github.com/huggingface/trl/pull/2105
* fix formatting by kashif in https://github.com/huggingface/trl/pull/2109
* [online-dpo] allow parse-args as list of floats by kashif in https://github.com/huggingface/trl/pull/2108
* Fix pack test by qgallouedec in https://github.com/huggingface/trl/pull/2111
* `BCOTrainer` conversational dataset support by qgallouedec in https://github.com/huggingface/trl/pull/2107
* Generalizes VSFT script to support REDACTED by edbeeching in https://github.com/huggingface/trl/pull/2120
* Update example_overview.md by kashif in https://github.com/huggingface/trl/pull/2125
* Remove `max_length` from `RewardDataCollatorWithPadding` by qgallouedec in https://github.com/huggingface/trl/pull/2119
* Standardize pushing to Hub in examples by qgallouedec in https://github.com/huggingface/trl/pull/2126
* Eos token encouragement Clarification by August-murr in https://github.com/huggingface/trl/pull/2128
* Tokenize row during in `training_step` by qgallouedec in https://github.com/huggingface/trl/pull/2117
* ♻️ Standardize `script_args` by qgallouedec in https://github.com/huggingface/trl/pull/2130
* Add table for WinRateCallback by lewtun in https://github.com/huggingface/trl/pull/2116
* 🧹 Style by qgallouedec in https://github.com/huggingface/trl/pull/2132
* arXiv to HF Papers by qgallouedec in https://github.com/huggingface/trl/pull/2133
* Add correct label for `WinRateCallback` table by lewtun in https://github.com/huggingface/trl/pull/2134
* πŸƒ Model card for TRL by qgallouedec in https://github.com/huggingface/trl/pull/2123
* Rename `dpo_visual.py` example to `dpo_vlm.py` by qgallouedec in https://github.com/huggingface/trl/pull/2139
* [GKD] Set custom EOS tokens in generation config by lewtun in https://github.com/huggingface/trl/pull/2142
* Fix attention mask warning chat cli by qgallouedec in https://github.com/huggingface/trl/pull/2147
* [CI] Don't use `eval_strategy="steps"` when no eval dataset by qgallouedec in https://github.com/huggingface/trl/pull/2152
* Conversational dataset support for `DPOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2131
* 🩹 [Hotfix] Add setter for tokenizer by qgallouedec in https://github.com/huggingface/trl/pull/2163
* ↩️ Revert tokenizer hotfix 2163 by qgallouedec in https://github.com/huggingface/trl/pull/2165
* chore: update test_cli.py by eltociear in https://github.com/huggingface/trl/pull/2168
* 🏷️ Model badges in trainer documentation by qgallouedec in https://github.com/huggingface/trl/pull/2160
* Default `dataset_text_field` to `"text"` by qgallouedec in https://github.com/huggingface/trl/pull/2078
* Update trl version in CITATION.cff by qgallouedec in https://github.com/huggingface/trl/pull/2171
* πŸ—‘οΈ Set deprecation version for DPO and SFT arguments to version 0.13 by qgallouedec in https://github.com/huggingface/trl/pull/2170
* Conversational dataset support for `CPOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2144
* Capybara replaced with ultrafeedback_binarized by August-murr in https://github.com/huggingface/trl/pull/2183
* minor KTO setting changes + KL batch size by kawine in https://github.com/huggingface/trl/pull/2153
* 🏷️ Model badges: select only TRL models by qgallouedec in https://github.com/huggingface/trl/pull/2178
* Rename trainer arg `tokenizer` to `processing_class` by qgallouedec in https://github.com/huggingface/trl/pull/2162
* Update documentation CLI Chat by qgallouedec in https://github.com/huggingface/trl/pull/2191
* πŸƒ Model card: `"unsloth"` tag by qgallouedec in https://github.com/huggingface/trl/pull/2173
* [CI] fix dpo gpu ci tests by kashif in https://github.com/huggingface/trl/pull/2189
* Update CONTRIBUTING.md by kushal34712 in https://github.com/huggingface/trl/pull/2181
* Fix RLOO checkpointing by bartoszzuk in https://github.com/huggingface/trl/pull/2114
* Update README.md by PRIYANKjakharia in https://github.com/huggingface/trl/pull/2186
* `skip_prompt=True` in `TextIteratorStreamer` by qgallouedec in https://github.com/huggingface/trl/pull/2193
* [CI] Use transformers from source in "tests_no_optional_dep" by qgallouedec in https://github.com/huggingface/trl/pull/2198
* Fix the bug of DPOTrainer where the coefficient of aux_loss is always 0 during training by muupan in https://github.com/huggingface/trl/pull/2200
* Fix the bug of aux_loss coefficient being 0 in BCOTrainer, CPOTrainer, KTOTrainer, and ORPOTrainer by muupan in https://github.com/huggingface/trl/pull/2201
* [DPO] Adding weighted preference optimization (WPO) by gaetanlop in https://github.com/huggingface/trl/pull/2141
* [GKD] interpolate in prob. space by kashif in https://github.com/huggingface/trl/pull/2204
* Drop `decoder_input_ids` in `DPOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2208
* Update incorrect data processing in DataCollatorForChatML by ruijunfeng in https://github.com/huggingface/trl/pull/2172
* Update log_example_reports.py by DhruvKadam-git in https://github.com/huggingface/trl/pull/2182
* Report to `"none"` in GKD test by qgallouedec in https://github.com/huggingface/trl/pull/2214
* [Judges] Soft judges for PairRM by kashif in https://github.com/huggingface/trl/pull/2221
* Update README.md by kushal34712 in https://github.com/huggingface/trl/pull/2180
* Updated README.md with CLI examples and additional usage instructions by Singhal1808 in https://github.com/huggingface/trl/pull/2199
* `trl env` report all cuda devices by qgallouedec in https://github.com/huggingface/trl/pull/2216
* Conversational dataset support for `ORPOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2184
* πŸ•ŠοΈ Migration `PPOv2` -> `PPO` by qgallouedec in https://github.com/huggingface/trl/pull/2174
* Add Sequence-Level KD by mst272 in https://github.com/huggingface/trl/pull/2220
* Update dataset_formats.mdx by August-murr in https://github.com/huggingface/trl/pull/2222
* πŸ“’ Fix type/format confusions by qgallouedec in https://github.com/huggingface/trl/pull/2223
* Update commands for code linting in contributing guidelines by Ben-Schneider-code in https://github.com/huggingface/trl/pull/2225
* Refactor `ScriptArguments` by qgallouedec in https://github.com/huggingface/trl/pull/2145
* Updated `ScriptArguments` warning messages by sergiopaniego in https://github.com/huggingface/trl/pull/2230
* DPO support `remove_unused_columns` by qgallouedec in https://github.com/huggingface/trl/pull/2233
* Setting capture output to False by August-murr in https://github.com/huggingface/trl/pull/2239
* Update SFT examples by lewtun in https://github.com/huggingface/trl/pull/2244
* Enhancements to Log Report Script: Improved Error Handling and Logging by DhruvKadam-git in https://github.com/huggingface/trl/pull/2232
* πŸ”€ Rename `get_batch_sample` and add `num_items_in_batch` to `compute_loss` by qgallouedec in https://github.com/huggingface/trl/pull/2246
* Refactor DPO data processing by qgallouedec in https://github.com/huggingface/trl/pull/2209
* Update dataset_formats.mdx by cameronphchen in https://github.com/huggingface/trl/pull/2259
* Use `processing_class` instead of `tokenizer` in `LogCompletionsCallback` by qgallouedec in https://github.com/huggingface/trl/pull/2261
* Adjust padding in batch generation by gaetanlop in https://github.com/huggingface/trl/pull/2251
* setup_chat_format: throw error if there is already a template in base model by ngxson in https://github.com/huggingface/trl/pull/2252
* Bump the minimum transformers version to v4.46 by qgallouedec in https://github.com/huggingface/trl/pull/2245
* Conversational dataset support for `KTOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2248
* [Judges] use the pair-judges in online-preference trainers by kashif in https://github.com/huggingface/trl/pull/2243
* Update reward_modeling.py by cameronphchen in https://github.com/huggingface/trl/pull/2266
* ♾️ Fix test generation `max_new_tokens` by qgallouedec in https://github.com/huggingface/trl/pull/2272
* Refactor `log_reports.py` for Improved Logging, File Processing, and Slack Payload Handling by Mefisto04 in https://github.com/huggingface/trl/pull/2249
* Replace log(sigmoid(log_odds) with logsigmoid(log_odds) for ORPO by zhanwenchen in https://github.com/huggingface/trl/pull/2274
* [KTO/BCO Trainer] add bos_token_id only if it exists by seanexp in https://github.com/huggingface/trl/pull/2279
* Fix the computation of KL divergence loss in Nash MD by d-tiapkin in https://github.com/huggingface/trl/pull/2277
* Don't pass `eval_dataset` in to trainers when no eval strategy by qgallouedec in https://github.com/huggingface/trl/pull/2270
* Update callbacks.py for fix small python type error by anch0vy in https://github.com/huggingface/trl/pull/2285
* Use any reward model for online methods by qgallouedec in https://github.com/huggingface/trl/pull/2276
* Clean dependencies by qgallouedec in https://github.com/huggingface/trl/pull/2298
* Fix `_save_checkpoint` for online methods by qgallouedec in https://github.com/huggingface/trl/pull/2288
* Refactor unit tests to use standard unittest assertion methods by ccs96307 in https://github.com/huggingface/trl/pull/2283
* Remove stale bot by qgallouedec in https://github.com/huggingface/trl/pull/2300
* Fix no optional dependencies by qgallouedec in https://github.com/huggingface/trl/pull/2301
* Add `optimizer_cls_and_kwargs` attribute to PPO and RLOO by qgallouedec in https://github.com/huggingface/trl/pull/2302
* Specify min versions by qgallouedec in https://github.com/huggingface/trl/pull/2303

New Contributors
* skandermoalla made their first contribution in https://github.com/huggingface/trl/pull/2092
* fabianlim made their first contribution in https://github.com/huggingface/trl/pull/2089
* gabikadlecova made their first contribution in https://github.com/huggingface/trl/pull/2093
* August-murr made their first contribution in https://github.com/huggingface/trl/pull/2128
* kushal34712 made their first contribution in https://github.com/huggingface/trl/pull/2181
* PRIYANKjakharia made their first contribution in https://github.com/huggingface/trl/pull/2186
* ruijunfeng made their first contribution in https://github.com/huggingface/trl/pull/2172
* DhruvKadam-git made their first contribution in https://github.com/huggingface/trl/pull/2182
* Singhal1808 made their first contribution in https://github.com/huggingface/trl/pull/2199
* mst272 made their first contribution in https://github.com/huggingface/trl/pull/2220
* Ben-Schneider-code made their first contribution in https://github.com/huggingface/trl/pull/2225
* sergiopaniego made their first contribution in https://github.com/huggingface/trl/pull/2230
* cameronphchen made their first contribution in https://github.com/huggingface/trl/pull/2259
* ngxson made their first contribution in https://github.com/huggingface/trl/pull/2252
* Mefisto04 made their first contribution in https://github.com/huggingface/trl/pull/2249
* zhanwenchen made their first contribution in https://github.com/huggingface/trl/pull/2274
* d-tiapkin made their first contribution in https://github.com/huggingface/trl/pull/2277
* anch0vy made their first contribution in https://github.com/huggingface/trl/pull/2285
* ccs96307 made their first contribution in https://github.com/huggingface/trl/pull/2283

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.11.0...v0.12.0

0.0005497377132996917

by kashif in https://github.com/huggingface/trl/pull/2221

Use pairwise judges for online methods

The `OnlineDPOTrainer` and any trainers that inherit from it (`NashMDTrainer` and `XPOTrainer`) can now accept an initialized `PairwiseJudge` instead of a reward model.

python
from datasets import load_dataset
from trl import OnlineDPOConfig, OnlineDPOTrainer, PairRMJudge
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
judge = PairRMJudge()
train_dataset = load_dataset("trl-lib/ultrafeedback-prompt", split="train")

training_args = OnlineDPOConfig(output_dir="Qwen2-0.5B-OnlineDPO", logging_steps=10)
trainer = OnlineDPOTrainer(
model=model, judge=judge, args=training_args, processing_class=tokenizer, train_dataset=train_dataset
)
trainer.train()


by kashif in https://github.com/huggingface/trl/pull/2243

Rename trainer arg `tokenizer` to `processing_class`

The `tokenizer` argument in the trainers has been renamed to `processing_class` to better reflect the fact that it can be not only a tokenizer but also a processor.

diff
- trainer = DPOTrainer(model, args=training_args, train_dataset=dataset, tokenizer=tokenizer)
+ trainer = DPOTrainer(model, args=training_args, train_dataset=dataset, processing_class=tokenizer)


`tokenizer` is still supported for `SFTTrainer` and `DPOTrainer` but deprecated and will be removed in the next release.

by qgallouedec in https://github.com/huggingface/trl/pull/2162

Adding weighted preference optimization (WPO) to DPO

The [WPO](https://huggingface.co/papers/2406.11827) paper adapts off-policy data to resemble on-policy data more closely by reweighting preference pairs according to their probability under the current policy. To use this method, set the `use_weighting` flag to `True` in the [`DPOConfig`].

python
DPOConfig(..., use_weighting=True)


<img width="1112" alt="Screenshot 2024-11-04 at 10 59 38" src="https://github.com/user-attachments/assets/544ddc02-bd09-4f21-b8a4-b81c21561a9b">
<img width="539" alt="Screenshot 2024-11-04 at 10 59 22" src="https://github.com/user-attachments/assets/8d5afe9e-89bd-4d00-8483-dd7ba98997e7">


by gaetanlop in https://github.com/huggingface/trl/pull/2141

πŸƒ Model card for TRL

Using `trainer.push_to_hub()` now automatically creates a model card that includes:

- A link to the base model used
- A link to the dataset used for training
- A link to the TRL repository
- Sample demo code
- A link to the associated Weights & Biases run
- A link to the paper detailing the training procedure
- Versions of dependencies
- BibTeX citations for both the training procedure and TRL

All links are properly formatted to allow cross-referencing, enabling traceability back to sources (e.g., the model appears linked on the paper’s page).


https://github.com/user-attachments/assets/b903964e-9087-45cc-8fb0-2418fdd87b72



by qgallouedec in https://github.com/huggingface/trl/pull/2123

Minor

Conversational dataset support

You can now use conversational datasets directly, without needing to apply a chat template beforehand, for the following trainers:

- `BCOTrainer` (by qgallouedec in PR 2107)
- `CPOTrainer` (by qgallouedec in PR 2144)
- `DPOTrainer` (by qgallouedec in PR 2131)
- `KTOTrainer` (by qgallouedec in PR 2248)
- `ORPOTrainer` (by qgallouedec in PR 2184)

python
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM
from trl import DPOTrainer

model = AutoModelForCausalLM.from_pretrained(model_id)
tokenizer = AutoTokenizer.from_pretrained(model_id)
dataset = load_dataset(dataset_name, split="train")

Not needed anymore:

def process(row):
prompt = tokenizer.apply_chat_template(example["prompt"], tokenize=False, add_generation_prompt=True)
prompt_chosen = tokenizer.apply_chat_template(example["prompt"] + example["chosen"], tokenize=False)
chosen = prompt_chosen[len(prompt) :]
prompt_rejected = tokenizer.apply_chat_template(example["prompt"] + example["rejected"], tokenize=False)
rejected = prompt_rejected[len(prompt) :]
return {"prompt": prompt, "chosen": chosen, "rejected": rejected}

dataset = dataset.map(process)

training_args = DPOConfig(output_dir="...")
trainer = DPOTrainer(model, args=training_args, train_dataset=dataset, processing_class=tokenizer)
trainer.train()


Refactor DPO data processing

For more information, see PR 2209.

`trl env` for printing system info

You can now use `trl env` to print system information, including the platform, Python version, PyTorch version, CUDA device(s), and versions of various libraries.


$ trl env

Copy-paste the following information when reporting an issue:

- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.9
- PyTorch version: 2.4.0
- CUDA device(s): NVIDIA H100 80GB HBM3
- Transformers version: 4.47.0.dev0
- Accelerate version: 0.19.0
- Accelerate config: not found
- Datasets version: 3.0.2
- HF Hub version: 0.26.1
- TRL version: 0.12.0+14ef1ab
- bitsandbytes version: 0.44.1
- DeepSpeed version: 0.15.3
- Diffusers version: 0.30.3
- Liger-Kernel version: 0.3.0
- LLM-Blender version: 0.0.2
- OpenAI version: 1.46.0
- PEFT version: 0.13.2


by qgallouedec in https://github.com/huggingface/trl/pull/2104

Sequence-Level KD

0.16

New Contributors
* XZ-X made their first contribution in https://github.com/huggingface/trl/pull/2873
* BenasdTW made their first contribution in https://github.com/huggingface/trl/pull/2863
* kldzj made their first contribution in https://github.com/huggingface/trl/pull/2811
* ingambe made their first contribution in https://github.com/huggingface/trl/pull/2806
* linkedlist771 made their first contribution in https://github.com/huggingface/trl/pull/2925
* DanFosing made their first contribution in https://github.com/huggingface/trl/pull/2939
* nopepper made their first contribution in https://github.com/huggingface/trl/pull/2951
* shenxiangzhuang made their first contribution in https://github.com/huggingface/trl/pull/2921
* Ishan-Kumar2 made their first contribution in https://github.com/huggingface/trl/pull/2918
* tpoisonooo made their first contribution in https://github.com/huggingface/trl/pull/2973
* nbasyl made their first contribution in https://github.com/huggingface/trl/pull/2974
* sileod made their first contribution in https://github.com/huggingface/trl/pull/2912
* logicaltrojan made their first contribution in https://github.com/huggingface/trl/pull/2969
* congchan made their first contribution in https://github.com/huggingface/trl/pull/2988
* vaibhavjindal made their first contribution in https://github.com/huggingface/trl/pull/2982
* kiddj made their first contribution in https://github.com/huggingface/trl/pull/2871
* paulinebm made their first contribution in https://github.com/huggingface/trl/pull/3007
* lexasub made their first contribution in https://github.com/huggingface/trl/pull/3001
* tchang1997 made their first contribution in https://github.com/huggingface/trl/pull/3004
* Aladoro made their first contribution in https://github.com/huggingface/trl/pull/3029
* abhigoyal1997 made their first contribution in https://github.com/huggingface/trl/pull/3043
* esnible made their first contribution in https://github.com/huggingface/trl/pull/3074
* mariosasko made their first contribution in https://github.com/huggingface/trl/pull/3009

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.15.0...v0.16.0

0.16.0

What's Changed
* [SFT] fix check for AutoLigerKernelForCausalLM by kashif in https://github.com/huggingface/trl/pull/2874
* πŸ†™ Bump vLLM min version to 0.7.2 by edbeeching in https://github.com/huggingface/trl/pull/2860
* [GRPO] Fix loss normalization by edbeeching in https://github.com/huggingface/trl/pull/2881
* πŸ’¬ Add `maybe_convert_to_chatml` map for conversational datasets in SFT by kashif in https://github.com/huggingface/trl/pull/2862
* 🧢 [GRPO][vLLM + LoRA] Move unmerge of PEFT model after weight loading by XZ-X in https://github.com/huggingface/trl/pull/2873
* 🍟 [SFT] Handles the dataset if it has been preprocessed by BenasdTW in https://github.com/huggingface/trl/pull/2863
* Optimize vllm num_generations by edbeeching in https://github.com/huggingface/trl/pull/2855
* πŸͺ‚ Don't gather logits in SFT to avoid hanging by qgallouedec in https://github.com/huggingface/trl/pull/2890
* ✨ Add vLLM guided decoding support to GRPO Trainer by kldzj in https://github.com/huggingface/trl/pull/2811
* ⚰️ Remove deprecated by qgallouedec in https://github.com/huggingface/trl/pull/2894
* 🩳 `max_seq_length` to `max_length` by qgallouedec in https://github.com/huggingface/trl/pull/2895
* πŸƒ GRPO - Do not load reference model when beta == 0 by ingambe in https://github.com/huggingface/trl/pull/2806
* πŸ“ [GRPO] add gradient_checkpointing by kashif in https://github.com/huggingface/trl/pull/2848
* πŸͺͺ Adds profiling decorators for GRPOTrainer by edbeeching in https://github.com/huggingface/trl/pull/2889
* πŸ¦β€πŸ”₯ 6x faster GRPO with multi-step optimization by qgallouedec in https://github.com/huggingface/trl/pull/2899
* πŸ”Ή Fix: Miscalculated mask shape in comments by linkedlist771 in https://github.com/huggingface/trl/pull/2925
* πŸ€– Style bot by qgallouedec in https://github.com/huggingface/trl/pull/2935
* 🧼 Upgrade ruff by qgallouedec in https://github.com/huggingface/trl/pull/2938
* 🐈 Bye bye chat by qgallouedec in https://github.com/huggingface/trl/pull/2934
* ♻️ Fix caching in SFT by qgallouedec in https://github.com/huggingface/trl/pull/2945
* πŸ“‹ Add vLLM version to environment printout by qgallouedec in https://github.com/huggingface/trl/pull/2946
* ☠️ Update `max_seq_length` to `max_length` in `SFTConfig` by qgallouedec in https://github.com/huggingface/trl/pull/2947
* 🐯 Fix LigerKernel for SFTTrainer by lewtun in https://github.com/huggingface/trl/pull/2940
* βœ‹ Prevent applying the chat template to tokenized datasets by DanFosing in https://github.com/huggingface/trl/pull/2939
* πŸ“‡ GRPO: print completions to console and update docs by nopepper in https://github.com/huggingface/trl/pull/2951
* ↩️ Fix typo in TextEnvironment init param, should be `max_tool_response` by shenxiangzhuang in https://github.com/huggingface/trl/pull/2921
* πŸ—Ώ Updated DPO default values for alpha and tau by Ishan-Kumar2 in https://github.com/huggingface/trl/pull/2918
* πŸ“Œ Pin liger-kernel and vLLM by qgallouedec in https://github.com/huggingface/trl/pull/2952
* βͺ Parameterize `enable_prefix_caching` by ji-huazhong in https://github.com/huggingface/trl/pull/2900
* πŸ”’ Fix GRPO doc about `num_iterations` by qgallouedec in https://github.com/huggingface/trl/pull/2966
* Update grpo_trainer.py by tpoisonooo in https://github.com/huggingface/trl/pull/2973
* πŸ‘§πŸ½ Adding DoRA support to model config by nbasyl in https://github.com/huggingface/trl/pull/2974
* πŸ§— Add GRPO Trainer support for third-party accelerators by ji-huazhong in https://github.com/huggingface/trl/pull/2836
* πŸ•Έ Add distributing training guide by qgallouedec in https://github.com/huggingface/trl/pull/2956
* πŸ‘‚ Update learning rate doc in `KTOConfig` by sileod in https://github.com/huggingface/trl/pull/2912
* 🌌 Fix logits computation in trainer prediction step by logicaltrojan in https://github.com/huggingface/trl/pull/2969
* πŸͺͺ Adds a more fine-grained profiling context by edbeeching in https://github.com/huggingface/trl/pull/2975
* 🧬 Fix typo in grpo_trainer.py by congchan in https://github.com/huggingface/trl/pull/2988
* πŸ“œ Update README and doc index by qgallouedec in https://github.com/huggingface/trl/pull/2986
* πŸ“‘ Fix logged metrics for KTO by vaibhavjindal in https://github.com/huggingface/trl/pull/2982
* ⚰️ Deprecate liger-kernel by qgallouedec in https://github.com/huggingface/trl/pull/2949
* πŸ” Update GRPO config documentation for beta parameter stability by nopepper in https://github.com/huggingface/trl/pull/2992
* πŸ«” [GRPO] Pass wrapped model to `unwrap_model_for_generation` for DeepSpeed Stage-3 compatibility by kiddj in https://github.com/huggingface/trl/pull/2871
* πŸ›£οΈ `inference_mode` to `no_grad` when computing `old_per_token_logps` by qgallouedec in https://github.com/huggingface/trl/pull/2987
* πŸš€ DeepSpeed integration documentation by qgallouedec in https://github.com/huggingface/trl/pull/2993
* Update pr_style_bot.yml by qgallouedec in https://github.com/huggingface/trl/pull/3003
* πŸͺ™ [SFT] Log `num_tokens` and some logging fixes by qgallouedec in https://github.com/huggingface/trl/pull/3006
* Improve ci by paulinebm in https://github.com/huggingface/trl/pull/3007
* ✌️Remove double compute of sum in SFTTrainer by lexasub in https://github.com/huggingface/trl/pull/3001
* πŸ“š Update customization and distributing training documentation by qgallouedec in https://github.com/huggingface/trl/pull/2991
* 🌍 Use global normalization for KL logging (to match normalization for loss) by tchang1997 in https://github.com/huggingface/trl/pull/3004
* πŸ—œοΈ Loosened tokenizer type hint on `apply_chat_template` by jamesbraza in https://github.com/huggingface/trl/pull/3005
* 🎲 Add support for additional generation kwargs in GRPO Trainer by nopepper in https://github.com/huggingface/trl/pull/2989
* πŸš€ Supporting `deepspeed>=0.16.4`'s rename by jamesbraza in https://github.com/huggingface/trl/pull/2963
* 🌑️ Fix temperature inconsistency in GRPO trainer by Aladoro in https://github.com/huggingface/trl/pull/3029
* 🏁 Passing custom BOS/EOS token to `GPROTrainer.generation_config` by jamesbraza in https://github.com/huggingface/trl/pull/3046
* πŸ’  Fixing `SFTTrainer.compute_loss` crash with `accelerate` by jamesbraza in https://github.com/huggingface/trl/pull/3048
* πŸ‘― [GRPO] Relax the assumption that prompts are unique within a batch by qgallouedec in https://github.com/huggingface/trl/pull/3052
* [GRPO] use argument names with processing_class by kashif in https://github.com/huggingface/trl/pull/3062
* πŸ¦₯ Fixed `SFTTrainer.compute_loss` hang from 3048's PR comments by jamesbraza in https://github.com/huggingface/trl/pull/3056
* 🏊 [SFT] Compatibility with padding free and iterable dataset by qgallouedec in https://github.com/huggingface/trl/pull/3053
* Fixing JSD loss computation in GKDTrainer as per definition by abhigoyal1997 in https://github.com/huggingface/trl/pull/3043
* 🎭 Minor spelling fix in documentation (caracteres -> characters) by esnible in https://github.com/huggingface/trl/pull/3074
* πŸ’Ž Gemma 3 SFT example on Codeforces dataset by qgallouedec in https://github.com/huggingface/trl/pull/3070
* 🫣 [GRPO] add cache_implementation option in GRPO by kashif in https://github.com/huggingface/trl/pull/3075
* β›” Add EOS token to processed input in SFT by qgallouedec in https://github.com/huggingface/trl/pull/3091
* πŸ•ŠοΈ Padding-free for SFT by qgallouedec in https://github.com/huggingface/trl/pull/3076
* add "_prepare_fsdp" for DPOTrainer by faaany in https://github.com/huggingface/trl/pull/2539
* Use main process for dataset.map by lewtun in https://github.com/huggingface/trl/pull/3106
* Flexible_reward by shirinyamani in https://github.com/huggingface/trl/pull/3079
* 🎬 Clip higher by shirinyamani in https://github.com/huggingface/trl/pull/3118
* πŸš€ Scaling GRPO to 70B+ Models and Multi-Node Training with vLLM Server & NCCL Communication by binary-husky in https://github.com/huggingface/trl/pull/3094
* ⚑ Pack 300 times faster, truncate 100 times faster by mariosasko in https://github.com/huggingface/trl/pull/3009
* ☎️ Documentation for disable gathering of model weights for generation in DeepSpeed ZeRO-3 by qgallouedec in https://github.com/huggingface/trl/pull/3136
* βš–οΈ Add option not to scale rewards (Dr. GRPO) by qgallouedec in https://github.com/huggingface/trl/pull/3135

0.15.2

What changed

- ♻️ Fix caching in SFT by qgallouedec in 2945
- 🐯 Fix LigerKernel for SFTTrainer by lewtun in 2940
- πŸ“Œ Pin liger-kernel and vLLM by qgallouedec in 2952

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.15.1...v0.15.2

0.15.1

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.15.0...v0.15.1

Page 1 of 10

Β© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.