Trl

Latest version: v0.12.2

Safety actively analyzes 688215 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 8

0.9.4

New Contributors
* GuilhermeFreire made their first contribution in https://github.com/huggingface/trl/pull/1706

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.9.3...v0.9.4

0.9.3

We are excited to introduce the new v0.9.3 release. Many new exciting features and algorithms. The highlights are as follows:


1. **RLOO Trainer**: RLOO (Reinforce Leave-one-out) is a new online RL algorithm for RLHF, proposed by [Ahmadian et al from Cohere](https://cohere.com/research/papers/back-to-basics-revisiting-reinforce-style-optimization-for-learning-from-human-feedback-in-llms-2024-02-23). Check out our docs [here](https://huggingface.co/docs/trl/rloo_trainer) to get started
2. **PPOv2 Trainer**: We are introducing a new experimental PPOv2 trainer which is more aligned with OpenAI's PPO implementation based on https://arxiv.org/abs/2403.17031. Check out our docs [here](https://huggingface.co/docs/trl/ppov2_trainer) to get started
3. **Reward model visualization**: the reward model training now includes visualization on the eval dataset, as shown below.

https://github.com/huggingface/trl/assets/5555347/6575a879-cb2f-4e2e-bb84-a76707f9de84

4. **New losses in the DPO Trainer**: DPOTrainer now includes losses / support for Self-play Preference Optimization, Robust DPO, TR-DPO, Iterative Reasoning Preference Optimization, and Pairwise Noise Contrastive Alignment
5. **New losses in the KTO Trainer**: KTOTrainer now includes the loss for Binary Classifier Optimization (BCO)







What's Changed
* set dev version by younesbelkada in https://github.com/huggingface/trl/pull/1568
* fix add_special_tokens issue for data with template by edixiong in https://github.com/huggingface/trl/pull/1509
* [DPO] add 'bco_pair' loss_type by seanexp in https://github.com/huggingface/trl/pull/1524
* [DPO] DPOConfig class by kashif in https://github.com/huggingface/trl/pull/1554
* [SFT] add SFT Trainer Config dataclass by kashif in https://github.com/huggingface/trl/pull/1530
* FIX: Fix CI on transformers main by younesbelkada in https://github.com/huggingface/trl/pull/1576
* [`SFTTrainer`] Add warning in SFTTrainer when dataset already processed by younesbelkada in https://github.com/huggingface/trl/pull/1577
* Fix typo detoxifying doc by qgallouedec in https://github.com/huggingface/trl/pull/1594
* Core: removed unexisting `SftArgumentParser` by younesbelkada in https://github.com/huggingface/trl/pull/1602
* [`KTOTrainer`] add BCO (reward shift and underlying distribution matching) by seanexp in https://github.com/huggingface/trl/pull/1599
* [CLI] Use auto device map for model load by lewtun in https://github.com/huggingface/trl/pull/1596
* Removing `tests/` from package data by jamesbraza in https://github.com/huggingface/trl/pull/1607
* Docs: Fix build main documentation by younesbelkada in https://github.com/huggingface/trl/pull/1604
* support loss function for Self-play Preference Optimization by winglian in https://github.com/huggingface/trl/pull/1612
* Update HH dataset on helpful only subset by vwxyzjn in https://github.com/huggingface/trl/pull/1613
* corrects loss function for Self-play Preference Optimization hard label version by angelahzyuan in https://github.com/huggingface/trl/pull/1615
* Fix ZeRO-3 generation context manager by lewtun in https://github.com/huggingface/trl/pull/1617
* fixed adding bos and eos token unconditionally by jasonyux in https://github.com/huggingface/trl/pull/1591
* visualize rm prediction by vwxyzjn in https://github.com/huggingface/trl/pull/1636
* [ORPO] Correct label mask for pad tokens by IlyaGusev in https://github.com/huggingface/trl/pull/1625
* Update sft_llama2.py to work with the latest API by xianbaoqian in https://github.com/huggingface/trl/pull/1637
* Fixed wrong logs prefixes in KTOTrainer by bartoszzuk in https://github.com/huggingface/trl/pull/1641
* Pairwise Noise Contrastive Alignment by winglian in https://github.com/huggingface/trl/pull/1632
* don't cast the trainable lora layers to half precision by pacman100 in https://github.com/huggingface/trl/pull/1644
* PPO / Reinforce Trainers by vwxyzjn in https://github.com/huggingface/trl/pull/1540
* Apply deprecated `evaluation_strategy` by muellerzr in https://github.com/huggingface/trl/pull/1559
* FEAT: Add support for training collator in PPOTrainer by younesbelkada in https://github.com/huggingface/trl/pull/1658
* Correct Documentation for cDPO Usage by AliBakly in https://github.com/huggingface/trl/pull/1655
* Fix inheritance order in PPOv2Config by Nicolinho in https://github.com/huggingface/trl/pull/1659
* [DPO] Add 'robust' loss_type by Abilityguy in https://github.com/huggingface/trl/pull/1653
* 🤫 TR-DPO implementation by syrn1k in https://github.com/huggingface/trl/pull/1593
* Do not upcast adapters when using FSDP+QLoRA by pacman100 in https://github.com/huggingface/trl/pull/1654
* [Tests] update eval_strategy API by kashif in https://github.com/huggingface/trl/pull/1662
* Fix ppov2 test case by vwxyzjn in https://github.com/huggingface/trl/pull/1661
* FIX / PPO: Fix `enable_input_require_grads` issues with PPO models by younesbelkada in https://github.com/huggingface/trl/pull/1664
* fix dataset load error by sywangyi in https://github.com/huggingface/trl/pull/1670
* FIX / SFTTrainer: Fix SFTTrainer with `args=None` by younesbelkada in https://github.com/huggingface/trl/pull/1678
* Fix max_completion_length for encoder_decoder models in KTO Trainer by samuki in https://github.com/huggingface/trl/pull/1588
* intial RPO loss by kashif in https://github.com/huggingface/trl/pull/1686
* Fix overriding optimize_device_cache with optimize_cuda_cache in PPOConfig by alexisrozhkov in https://github.com/huggingface/trl/pull/1690
* Skip packing validation by alex-jw-brooks in https://github.com/huggingface/trl/pull/1673
* Fix typo in DPOTrainer's warnings by qgallouedec in https://github.com/huggingface/trl/pull/1688
* Quick fix on GPT4-eval by vwxyzjn in https://github.com/huggingface/trl/pull/1696

0.9.2

New Contributors
* edixiong made their first contribution in https://github.com/huggingface/trl/pull/1509
* seanexp made their first contribution in https://github.com/huggingface/trl/pull/1524
* jamesbraza made their first contribution in https://github.com/huggingface/trl/pull/1607
* winglian made their first contribution in https://github.com/huggingface/trl/pull/1612
* angelahzyuan made their first contribution in https://github.com/huggingface/trl/pull/1615
* jasonyux made their first contribution in https://github.com/huggingface/trl/pull/1591
* IlyaGusev made their first contribution in https://github.com/huggingface/trl/pull/1625
* xianbaoqian made their first contribution in https://github.com/huggingface/trl/pull/1637
* bartoszzuk made their first contribution in https://github.com/huggingface/trl/pull/1641
* muellerzr made their first contribution in https://github.com/huggingface/trl/pull/1559
* AliBakly made their first contribution in https://github.com/huggingface/trl/pull/1655
* Nicolinho made their first contribution in https://github.com/huggingface/trl/pull/1659
* Abilityguy made their first contribution in https://github.com/huggingface/trl/pull/1653
* syrn1k made their first contribution in https://github.com/huggingface/trl/pull/1593
* alexisrozhkov made their first contribution in https://github.com/huggingface/trl/pull/1690
* alex-jw-brooks made their first contribution in https://github.com/huggingface/trl/pull/1673

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.8.6...v0.9.2

0.8.6

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.8.5...v0.8.6

0.8.5

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.8.4...v0.8.5

0.8.4

New Contributors

* ejmejm made their first contribution in https://github.com/huggingface/trl/pull/1537

**Full Changelog**: https://github.com/huggingface/trl/compare/v0.8.3...v0.8.4

Page 3 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.