Trl

Latest version: v0.12.2

Safety actively analyzes 688238 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 8

0.11.3

What's Changed

* [GKD] interpolate in prob. space by kashif in https://github.com/huggingface/trl/pull/2204
* Drop `decoder_input_ids` in `DPOTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/2208
* Update incorrect data processing in DataCollatorForChatML by ruijunfeng in https://github.com/huggingface/trl/pull/2172

New Contributors

* ruijunfeng made their first contribution in https://github.com/huggingface/trl/pull/2172

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.11.2...v0.11.3

0.11.2

What's Changed

* Fix RLOO checkpointing by bartoszzuk in https://github.com/huggingface/trl/pull/2114

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.11.1...v0.11.2

0.11.1

Bug fix

* allow parse-args as list of floats for Online DPO, XPO and Nash-MD configs by kashif in 2108

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.11.0...v0.11.1

0.11.0

We are excited to introduce the new v0.11.0 release, with many new features and post-training algorithms. The highlights are as follows:

New post-training methods

Generalized Knowledge Distillation

<img width="992" alt="Screenshot 2024-09-19 at 10 01 02" src="https://github.com/user-attachments/assets/97afd65d-1a2c-484b-b6dd-b02a2cbe6430">

[Generalized Knowledge Distillation](https://huggingface.co/papers/2306.13649) (GKD) is a post-training method from Google DeepMind that extends standard knowledge distillation by allowing the student to generate outputs during training and receive _online feedback_ from the teacher. It consistently outperforms SFT and in some cases enables the student model to match the performance of the teacher, but with far fewer parameters.

To train models with this method, check out the [`GKDTrainer`](https://huggingface.co/docs/trl/v0.11.0/en/gkd_trainer).

Exploratory Preference Optimization

<img width="1224" alt="Screenshot 2024-09-19 at 10 13 27" src="https://github.com/user-attachments/assets/36decb24-ef01-41f1-84e8-53b491eb6c86">

[Exploratory Preference Optimization](https://huggingface.co/papers/2405.21046) is an online post-training method from researchers at Microsoft, MIT, and Wisconsin that extends DPO to incorporate online feedback from reward models or LLM judges. It is similar to [online DPO](https://huggingface.co/docs/trl/v0.11.0/en/online_dpo_trainer), but has a slightly different theoretical basis concerning sample efficiency.

To train models with this method, check out the [`XPOTrainer`](https://huggingface.co/docs/trl/v0.11.0/en/xpo_trainer#xpo-trainer).

Nash Learning with Human Feedback

<img width="476" alt="Screenshot 2024-09-19 at 10 32 04" src="https://github.com/user-attachments/assets/8e68263f-bf5a-4f68-b451-110c78e27bb6">

[Nash Learning with Human Feedback](https://huggingface.co/papers/2312.00886) is a novel post-training method from Google DeepMind that uses _pairwise preference models_ which are conditioned on two inputs, instead of the single one used in reward models. These preference models are then used to train a policy that consistently produces responses that are preferred over those from competing policies, thus approximating a Nash equilibrium (i.e. a two player game where actions are responses and payoffs are given by the preference model).

To train models with this method, check out the [`NashMDTrainer`](https://huggingface.co/docs/trl/v0.11.0/en/nash_md_trainer#nash-md-trainer).

New trainer features

* Online DPO now supports training LoRA adapters with PEFT, which means you can dramatically reduce the amount of VRAM needed to train models with this method. By qgallouedec in https://github.com/huggingface/trl/pull/2041
* The `OrpoTrainer` has better integration with PyTorchXLA for faster step time on TPUs ⚡ . By wenxindongwork in https://github.com/huggingface/trl/pull/2001

Deprecations 🚨

* The `PPOTrainer` is marked for deprecated in favour of `PPOv2Trainer` to provide a consistent API across TRL's trainers. It will be removed in `v0.12.0`. By qgallouedec in https://github.com/huggingface/trl/pull/2016
* The `RichProgressCallback` has been removed from the example scripts as it caused a variety of problems with logging in distributed environments. You can still use it by adding it manually to the trainer callbacks. By lewtun in https://github.com/huggingface/trl/pull/2053

Bugfixes and improvements
* Adds experimental Liger support to SFT script by edbeeching in https://github.com/huggingface/trl/pull/1992
* move slow-tests CI to new cluster by glegendre01 in https://github.com/huggingface/trl/pull/1996
* [Online-DPO] fixes to the training scripts and setup.py by kashif in https://github.com/huggingface/trl/pull/1997
* [pre-commit] update pre-commit yaml by kashif in https://github.com/huggingface/trl/pull/2002
* [Docs] Add Liger-Kernel usage to SFTTrainer page by ryankert01 in https://github.com/huggingface/trl/pull/2007
* [ci] pin numpy to < 2 on windows by kashif in https://github.com/huggingface/trl/pull/2009
* Remove `prompts` arg from `WinrateCallback` by qgallouedec in https://github.com/huggingface/trl/pull/2010
* Allow `WinRateCallback` to be used without reference model by qgallouedec in https://github.com/huggingface/trl/pull/2013
* Feat: Add support for APO-zero in KTOTrainer by KarelDO in https://github.com/huggingface/trl/pull/1952
* Clean configs documentation by qgallouedec in https://github.com/huggingface/trl/pull/1944
* Refactor reward modelling script to work with chat models by lewtun in https://github.com/huggingface/trl/pull/2026
* correct formatting of star sign in kto_trainer.mdx by mattany in https://github.com/huggingface/trl/pull/2031
* Remove unused functions in `core.py` by northern-64bit in https://github.com/huggingface/trl/pull/2017
* Improves formatting of docstring + newlines by northern-64bit in https://github.com/huggingface/trl/pull/2006
* Fix `packing` doc in `SFTConfig` and fix error when neither `dataset_text_field` nor `formatting_func` is provided. by qgallouedec in https://github.com/huggingface/trl/pull/2035
* fix: unpackaging error in Custom Mixture of Experts model when `aux_loss_enabled` is set to True. by Jonathanjordan21 in https://github.com/huggingface/trl/pull/2039
* Drop canonical namespaces by qgallouedec in https://github.com/huggingface/trl/pull/2048
* Change `non_eos_penalty` to be consistent across `OnPolicy` trainers by RylanSchaeffer in https://github.com/huggingface/trl/pull/2033
* Temporary pin the transformers hash in the CI by qgallouedec in https://github.com/huggingface/trl/pull/2049
* [XPO] xpo trainer by kashif in https://github.com/huggingface/trl/pull/1943
* Fix logits compuation in KTO trainer prediction step by issamemari in https://github.com/huggingface/trl/pull/2050
* [Draft, don't merge] Fix failing windows by LysandreJik in https://github.com/huggingface/trl/pull/2051
* Clean up DPO example by lewtun in https://github.com/huggingface/trl/pull/2043
* Remove `debug` and `sanity_check` args by qgallouedec in https://github.com/huggingface/trl/pull/2055
* Gkd trainer by kashif in https://github.com/huggingface/trl/pull/1814
* Documentation dataset format by qgallouedec in https://github.com/huggingface/trl/pull/2020
* Add missing autodocs by qgallouedec in https://github.com/huggingface/trl/pull/2056
* Mask loss in gkd when generating from the student by gaetanlop in https://github.com/huggingface/trl/pull/2058
* ©️ Copyrights by qgallouedec in https://github.com/huggingface/trl/pull/2063
* Support for `SFTTrainer.evaluate()` and `SFTTrainer.predict()` with null train_dataset by Sohaib9920 in https://github.com/huggingface/trl/pull/2004
* make cuda-only tests device-agnostic by faaany in https://github.com/huggingface/trl/pull/2044
* Make `ConstantLengthDataset` (or `packing=True`) shuffle examples before they are packed by muupan in https://github.com/huggingface/trl/pull/2037
* Standardise API for `WinRateCallback` and `LogCompletionsCallback` by lewtun in https://github.com/huggingface/trl/pull/2061
* Fix dataset in GKD script by lewtun in https://github.com/huggingface/trl/pull/2067
* [online models] remove min_new_tokens=args.max_new_tokens by kashif in https://github.com/huggingface/trl/pull/2069
* Standardising datasets for testing by qgallouedec in https://github.com/huggingface/trl/pull/2065
* [KTO] learning rate recomentations for kto by kashif in https://github.com/huggingface/trl/pull/2070
* Nash md by kashif in https://github.com/huggingface/trl/pull/1853
* Use `transformers` utilities when possible by qgallouedec in https://github.com/huggingface/trl/pull/2064
* Minor doc fixes and comments by qgallouedec in https://github.com/huggingface/trl/pull/2073
* Added error check to RLOO, PPOv2, OnlineDPO that `ref_policy` and `policy` have different identities by RylanSchaeffer in https://github.com/huggingface/trl/pull/2057
* `processor(prompt, images=image)` to `processor(images=image, text=prompt)` by qgallouedec in https://github.com/huggingface/trl/pull/2076
* Use wrapped model for reference completions in `WinRateCallback` and set default `freq` to `eval_steps` in LogCompletionsCallback` by lewtun in https://github.com/huggingface/trl/pull/2074
* Conversational dataset support for Online DPO by qgallouedec in https://github.com/huggingface/trl/pull/2075
* [WIP] Fix `logits/chosen` and `logits/rejected` metrics in `kto_trainer`. by PhilipMay in https://github.com/huggingface/trl/pull/2077
* Standardize dataset naming by qgallouedec in https://github.com/huggingface/trl/pull/2081
* Fix deepspeed for `PPOv2Trainer` by qgallouedec in https://github.com/huggingface/trl/pull/2080

New Contributors
* AdnaneKhan made their first contribution in https://github.com/huggingface/trl/pull/1822
* mkopecki made their first contribution in https://github.com/huggingface/trl/pull/1825
* DZ9 made their first contribution in https://github.com/huggingface/trl/pull/1836
* MAOJIASONG made their first contribution in https://github.com/huggingface/trl/pull/1840
* davanstrien made their first contribution in https://github.com/huggingface/trl/pull/1845
* eliebak made their first contribution in https://github.com/huggingface/trl/pull/1863
* Rishav-hub made their first contribution in https://github.com/huggingface/trl/pull/1862
* cemiu made their first contribution in https://github.com/huggingface/trl/pull/1738
* SunMarc made their first contribution in https://github.com/huggingface/trl/pull/1919
* karel-contextual made their first contribution in https://github.com/huggingface/trl/pull/1928
* RylanSchaeffer made their first contribution in https://github.com/huggingface/trl/pull/1932
* mina-parham made their first contribution in https://github.com/huggingface/trl/pull/1961
* RhuiDih made their first contribution in https://github.com/huggingface/trl/pull/1887
* SeungyounShin made their first contribution in https://github.com/huggingface/trl/pull/1969
* kit1980 made their first contribution in https://github.com/huggingface/trl/pull/1933
* akakakakakaa made their first contribution in https://github.com/huggingface/trl/pull/1987
* hvaara made their first contribution in https://github.com/huggingface/trl/pull/1990
* glegendre01 made their first contribution in https://github.com/huggingface/trl/pull/1996
* ryankert01 made their first contribution in https://github.com/huggingface/trl/pull/2007
* KarelDO made their first contribution in https://github.com/huggingface/trl/pull/1952
* mattany made their first contribution in https://github.com/huggingface/trl/pull/2031
* northern-64bit made their first contribution in https://github.com/huggingface/trl/pull/2017
* Jonathanjordan21 made their first contribution in https://github.com/huggingface/trl/pull/2039
* issamemari made their first contribution in https://github.com/huggingface/trl/pull/2050
* wenxindongwork made their first contribution in https://github.com/huggingface/trl/pull/2001
* Sohaib9920 made their first contribution in https://github.com/huggingface/trl/pull/2004
* faaany made their first contribution in https://github.com/huggingface/trl/pull/2044
* muupan made their first contribution in https://github.com/huggingface/trl/pull/2037
* PhilipMay made their first contribution in https://github.com/huggingface/trl/pull/2077

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.9.6...v0.11.0

0.10.1

We are excited to introduce the new v0.10.1 release, with many new exciting features and post-training algorithms. The highlights are as follows:

Online DPO

<img width="1210" alt="Screenshot 2024-08-29 at 15 53 29" src="https://github.com/user-attachments/assets/c11863ca-434c-47d7-8436-dc096683075a">

[Online DPO](https://arxiv.org/abs/2402.04792) is a new alignment method from DeepMind to boost the performance of LLMs. With Online DPO, data is generated on the fly by the trained model (instead of pre-collected). For each prompt, two completions are generated, with a reward model selecting the preferred one. This approach:
* Eliminates the need for a pre-collected preference dataset (it's generated online)
* Enables continuous model improvement
* Yields better results than traditional DPO

To train models with this method, use the `OnlineDPOTrainer`

Liger Triton kernels for supercharged SFT

![image (18)](https://github.com/user-attachments/assets/3e6cca57-1ddd-44fc-9822-e1b38f8d056c)

* We've integrated [LinkedIn's Liger Triton kernels](https://github.com/linkedin/Liger-Kernel) to the `SFTTrainer` for faster throughput and lower memory usage. To use them, set `use_liger_kernel` in `SFTConfig`

DPO for VLMs

* We've added support to align vision-language models with DPO, now covering architectures LLaVa-1.5, PaliGemma, and Idefics2. To train VLMs with DPO, use the `dpo_visual.py` script as follows

bash
accelerate launch examples/scripts/dpo_visual.py \
--dataset_name HuggingFaceH4/rlaif-v_formatted \
--model_name_or_path google/paligemma-3b-pt-224 \
--trust_remote_code \
--per_device_train_batch_size 1 \
--gradient_accumulation_steps 8 \
--output_dir dpo_paligemma_rlaif-v \
--bf16 \
--torch_dtype bfloat16

WinRate callback for LLM as a judge

* We've added support to compute win rates over the reference model for methods like DPO. To do so, configure the callback to point to the LLM as judge API (OpenAI or Hugging Face Inference API) and then add:

python
trainer = DPOTrainer(...)
win_rate_callback = WinRateCallback(..., trainer=trainer)
trainer.add_callback(win_rate_callback)

Anchored Preference Optimisation (APO) for fine-grained human/AI feedback

![](https://contextual.ai/wp-content/uploads/2024/08/blogpost-2-1.png)

* Added the [APO](https://huggingface.co/papers/2408.06266) method, which is an "anchored" version of the alignment objective. There are two variants: `apo_zero` and `apo_down`. The `apo_zero` loss increases the likelihood of winning outputs while decreasing the likelihood of losing outputs, making it suitable when the model is less performant than the winning outputs. On the other hand, `apo_down` decreases the likelihood of both winning and losing outputs, but with a stronger emphasis on reducing the likelihood of losing outputs. This variant is more effective when the model is better than the winning outputs. To use these losses, set `loss_type="apo_zero"` or `loss_type="apo_down"` in the `DPOConfig`

What's Changed

* Set dev version by vwxyzjn in https://github.com/huggingface/trl/pull/1817
* Upgrade GitHub actions by qgallouedec in https://github.com/huggingface/trl/pull/1818
* DPO Llava 1.5 and PaliGemma support by qgallouedec in https://github.com/huggingface/trl/pull/1797
* Delete unused benchmark.yml workflow by AdnaneKhan in https://github.com/huggingface/trl/pull/1822
* Consistent use of trust_remote_code by qgallouedec in https://github.com/huggingface/trl/pull/1806
* Fix: authentication token kwarg not passed when loading PEFT adapters by mkopecki in https://github.com/huggingface/trl/pull/1825
* refactor trainer callbacks by kashif in https://github.com/huggingface/trl/pull/1826
* Uniform `model_ref` naming by qgallouedec in https://github.com/huggingface/trl/pull/1835
* fix ppov2_trainer tensorboard logging bug by DZ9 in https://github.com/huggingface/trl/pull/1836
* Fix issues of KTOTrainer by MAOJIASONG in https://github.com/huggingface/trl/pull/1840
* add link to DPO datasets collection by davanstrien in https://github.com/huggingface/trl/pull/1845
* fix arg parsing in chat.py by lvwerra in https://github.com/huggingface/trl/pull/1846
* DPO for VLM blog post in doc by qgallouedec in https://github.com/huggingface/trl/pull/1844
* Add WinRateCallback and Judges by lewtun in https://github.com/huggingface/trl/pull/1598
* Remove `CI_HUB_USER_TOKEN` by qgallouedec in https://github.com/huggingface/trl/pull/1852
* Online DPO and Online trainer refactor by vwxyzjn in https://github.com/huggingface/trl/pull/1809
* [online-DPO] online dpo cleanups by kashif in https://github.com/huggingface/trl/pull/1864
* arXiv to HF Papers by qgallouedec in https://github.com/huggingface/trl/pull/1870
* fix fsdp & qlora support by eliebak in https://github.com/huggingface/trl/pull/1863
* Import missing setup_chat_format by Rishav-hub in https://github.com/huggingface/trl/pull/1862
* Bug Fix while training using SFTTrainer with DataCollatorForCompletionOnlyLM by Rishav-hub in https://github.com/huggingface/trl/pull/1861
* Small fixes to online dpo example by edbeeching in https://github.com/huggingface/trl/pull/1879
* Skip BigBird save and load test until next transformers version by qgallouedec in https://github.com/huggingface/trl/pull/1874
* Llama in modelling value head tests by qgallouedec in https://github.com/huggingface/trl/pull/1878
* Improve judges by qgallouedec in https://github.com/huggingface/trl/pull/1856
* [Do not merge] Re-add BigBird Pegasus save/load test by qgallouedec in https://github.com/huggingface/trl/pull/1876
* Re-add BigBird Pegasus save/load test by qgallouedec in https://github.com/huggingface/trl/pull/1882
* Move BCO to separate BCOTrainer with fixes by claralp in https://github.com/huggingface/trl/pull/1869
* Update example overview documentation section by qgallouedec in https://github.com/huggingface/trl/pull/1883
* fix dpo_trainer bug for LLMs without bos_token in config by DZ9 in https://github.com/huggingface/trl/pull/1885
* Fix SFT for VLM example by qgallouedec in https://github.com/huggingface/trl/pull/1865
* `evaluation_strategy` -> `eval_strategy` by qgallouedec in https://github.com/huggingface/trl/pull/1894
* fix serialization of RunningMoments on multiple GPUs by claralp in https://github.com/huggingface/trl/pull/1892
* [WIP] Fix CI by qgallouedec in https://github.com/huggingface/trl/pull/1897
* Drop `setUpClass` in reward tester by qgallouedec in https://github.com/huggingface/trl/pull/1895
* Support `IterableDataset` for `SFTTrainer` by qgallouedec in https://github.com/huggingface/trl/pull/1899
* Fix data processing in ORPO example script by qgallouedec in https://github.com/huggingface/trl/pull/1903
* [RPO] use loss from v3 of paper by kashif in https://github.com/huggingface/trl/pull/1904
* Support Rank Stabilized LoRA in the ModelConfig/LoraConfig by JohnGiorgi in https://github.com/huggingface/trl/pull/1877
* [Online-DPO] num_generation_per_prompt is fixed by kashif in https://github.com/huggingface/trl/pull/1898
* Fix GPT2 sentiment notebook reward by cemiu in https://github.com/huggingface/trl/pull/1738
* Fix `AlignPropTrainer` import by qgallouedec in https://github.com/huggingface/trl/pull/1908
* Various args and test fix by qgallouedec in https://github.com/huggingface/trl/pull/1909
* `lr_scheduler.step()` after `optimizer.step()` by qgallouedec in https://github.com/huggingface/trl/pull/1918
* `torch.cuda.amp.autocast()` -> `torch.amp.autocast("cuda")` by qgallouedec in https://github.com/huggingface/trl/pull/1921
* Fix orpo trainer loss device by SunMarc in https://github.com/huggingface/trl/pull/1919
* Add transformers library name for TRL repos by lewtun in https://github.com/huggingface/trl/pull/1922
* Standardize `dataset_num_proc` usage by qgallouedec in https://github.com/huggingface/trl/pull/1925
* `PartialState().local_main_process_first()` when map in examples by qgallouedec in https://github.com/huggingface/trl/pull/1926
* minor BCO fixes by claralp in https://github.com/huggingface/trl/pull/1923
* Improve DPO/loss doc by qgallouedec in https://github.com/huggingface/trl/pull/1929
* feat: anchored pref optimization by karel-contextual in https://github.com/huggingface/trl/pull/1928
* Add tests for DPO for VLM by qgallouedec in https://github.com/huggingface/trl/pull/1935
* fix model to save in ppov2 by mnoukhov in https://github.com/huggingface/trl/pull/1776
* Optional Additional Loss to Center Reward Models' Outputs by RylanSchaeffer in https://github.com/huggingface/trl/pull/1932
* Properly label all models when pushed to the hub by qgallouedec in https://github.com/huggingface/trl/pull/1940
* Skip token in `push_to_hub` by qgallouedec in https://github.com/huggingface/trl/pull/1945
* Fix model wrapping for online DPO by lewtun in https://github.com/huggingface/trl/pull/1946
* Don't mark issues as stale if nobody answered by qgallouedec in https://github.com/huggingface/trl/pull/1949
* Add a simple-to-understand example for online DPO by vwxyzjn in https://github.com/huggingface/trl/pull/1947
* Log WandB tables on main process by lewtun in https://github.com/huggingface/trl/pull/1951
* [ODPO] Fix global step for consistent checkpointing with global updates by lewtun in https://github.com/huggingface/trl/pull/1950
* "help wanted" in label to exempt from stale by qgallouedec in https://github.com/huggingface/trl/pull/1956
* Fix response truncation in examples/notebooks/gpt2-sentiment.ipynb by qgallouedec in https://github.com/huggingface/trl/pull/1957
* [ODPO] Refactor training script to use messages API by lewtun in https://github.com/huggingface/trl/pull/1958
* Support LLaVA-NeXT in Vision SFT by qgallouedec in https://github.com/huggingface/trl/pull/1959
* Add issue/PR templates, code of conduct & better contributing guide by lewtun in https://github.com/huggingface/trl/pull/1963
* Fix issue with precompute_ref_log_probs not working when rpo_alpha is None by mina-parham in https://github.com/huggingface/trl/pull/1961
* add arg `padding_free` to DataCollatorForCompletionOnlyLM by RhuiDih in https://github.com/huggingface/trl/pull/1887
* Optimize DPO log probability calculation by retaining necessary cache, saving up to 30GB of memory (1968) by SeungyounShin in https://github.com/huggingface/trl/pull/1969
* New mismatch pair creation strategy by qgallouedec in https://github.com/huggingface/trl/pull/1970
* Fix issue templates location by qgallouedec in https://github.com/huggingface/trl/pull/1973
* Use weights_only for load by kit1980 in https://github.com/huggingface/trl/pull/1933
* Fix flaky Hub tests by lewtun in https://github.com/huggingface/trl/pull/1981
* fix a few minor bugs in ppo.py by kykim0 in https://github.com/huggingface/trl/pull/1966
* Test for 1970 by qgallouedec in https://github.com/huggingface/trl/pull/1974
* Restore reruns for flaky tests by lewtun in https://github.com/huggingface/trl/pull/1982
* Promote `PairRMJudge` to top-level import by qgallouedec in https://github.com/huggingface/trl/pull/1985
* [DPO] TR-DPO gather the target model params as well when syncing by kashif in https://github.com/huggingface/trl/pull/1978
* `torch.load` with `weights_only=True` by qgallouedec in https://github.com/huggingface/trl/pull/1988
* Skip the failing Online DPO test by qgallouedec in https://github.com/huggingface/trl/pull/1989
* Refactor Online DPO by vwxyzjn in https://github.com/huggingface/trl/pull/1839
* [DPO] tokenize and process DPO data via batches by kashif in https://github.com/huggingface/trl/pull/1914
* [RPO] Add ignore_index in DPOTrainer's nn.CrossEntropyLoss by akakakakakaa in https://github.com/huggingface/trl/pull/1987
* Relax numpy upper bound and bump deepspeed version by hvaara in https://github.com/huggingface/trl/pull/1990
* Adds experimental Liger support to SFT script by edbeeching in https://github.com/huggingface/trl/pull/1992

New Contributors
* AdnaneKhan made their first contribution in https://github.com/huggingface/trl/pull/1822
* mkopecki made their first contribution in https://github.com/huggingface/trl/pull/1825
* DZ9 made their first contribution in https://github.com/huggingface/trl/pull/1836
* MAOJIASONG made their first contribution in https://github.com/huggingface/trl/pull/1840
* davanstrien made their first contribution in https://github.com/huggingface/trl/pull/1845
* eliebak made their first contribution in https://github.com/huggingface/trl/pull/1863
* Rishav-hub made their first contribution in https://github.com/huggingface/trl/pull/1862
* cemiu made their first contribution in https://github.com/huggingface/trl/pull/1738
* SunMarc made their first contribution in https://github.com/huggingface/trl/pull/1919
* karel-contextual made their first contribution in https://github.com/huggingface/trl/pull/1928
* RylanSchaeffer made their first contribution in https://github.com/huggingface/trl/pull/1932
* mina-parham made their first contribution in https://github.com/huggingface/trl/pull/1961
* RhuiDih made their first contribution in https://github.com/huggingface/trl/pull/1887
* SeungyounShin made their first contribution in https://github.com/huggingface/trl/pull/1969
* kit1980 made their first contribution in https://github.com/huggingface/trl/pull/1933
* akakakakakaa made their first contribution in https://github.com/huggingface/trl/pull/1987
* hvaara made their first contribution in https://github.com/huggingface/trl/pull/1990

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.9.6...v0.10

0.9.6

New Contributors
* McPatate made their first contribution in https://github.com/huggingface/trl/pull/1721
* jetlime made their first contribution in https://github.com/huggingface/trl/pull/1724
* imelnyk made their first contribution in https://github.com/huggingface/trl/pull/1701
* AIR-hl made their first contribution in https://github.com/huggingface/trl/pull/1753
* 1485840691 made their first contribution in https://github.com/huggingface/trl/pull/1610
* idanshen made their first contribution in https://github.com/huggingface/trl/pull/1742
* mertsayar8 made their first contribution in https://github.com/huggingface/trl/pull/1718
* scottsuk0306 made their first contribution in https://github.com/huggingface/trl/pull/1758
* mihirp1998 made their first contribution in https://github.com/huggingface/trl/pull/1585
* haozheji made their first contribution in https://github.com/huggingface/trl/pull/1735
* Mubin17 made their first contribution in https://github.com/huggingface/trl/pull/1774
* detsutut made their first contribution in https://github.com/huggingface/trl/pull/1788
* noahlt made their first contribution in https://github.com/huggingface/trl/pull/1794

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.9.4...v0.9.6

Page 2 of 8

Releases

Has known vulnerabilities

Previous Next

Trl

Page 2 of 8

0.11.3

0.11.2

0.11.1

0.11.0

0.10.1

0.9.6

Page 2 of 8

Links

Releases