Openrlhf

Latest version: v0.6.4

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 11

0.5.4

What's Changed
* Fixed typos: advatanges -> advantages by songxxzp in https://github.com/OpenRLHF/OpenRLHF/pull/570
* Only decode the queries once for multiple remote rm by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/572
* overlap vllm init and actor/reward model loading by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/575
* correct the order of multiplication in grad acc by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/577
* Support ring-attention during sft phase by UbeCc in https://github.com/OpenRLHF/OpenRLHF/pull/576
* Add better error message for empty datasets by frrad in https://github.com/OpenRLHF/OpenRLHF/pull/581
* Fix nan for sft-ring when labels are all IGNORE_INDEX by UbeCc in https://github.com/OpenRLHF/OpenRLHF/pull/583
* explicitly ignore attention_mask for packing_samples. by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/588
* [Set default grad_accum_dtype to None](https://github.com/OpenRLHF/OpenRLHF/commit/47f7cd8fc76de6d057d053251c1b55c00421cc24) xiaoxigua999
* update global batch size in eval model compatible to ring-attn-size by ShomyLiu in https://github.com/OpenRLHF/OpenRLHF/pull/590

New Contributors
* songxxzp made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/570
* UbeCc made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/576
* frrad made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/581
* ShomyLiu made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/590

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.3...v0.5.4

0.5.3

Highlights
* Support using `AutoModelForSequenceClassification.from_pretrained` load reward model xiaoxigua999 https://github.com/OpenRLHF/OpenRLHF/commit/a9c482a3cdabc94f34f3c5670fc964d8a9f86c63
- The default `value_head` name of the reward model has been changed to `score`
* Support RLOO with per-token KL penalty by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/515

What's Changed
* Fix bug on prm trainer w.r.t no packing samples and ring attn by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/551
* Fix issue 549 ZeroDivisionError during DPO/SFT/RM/KTO/KD training eval step by MarxistZ in https://github.com/OpenRLHF/OpenRLHF/pull/561 xiaoxigua999


New Contributors
* MarxistZ made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/561

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.2...v0.5.3

0.5.2.post1

Highlights
* Support vLLM NCCL weights sync for multi-nodes RLHF by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/543

What's Changed
* Update docker container to 24.07 and vLLM to 0.6.4.post1 xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/543

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.2...v0.5.2.post1

0.5.2

Highlights
* Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/524

What's Changed
* Fix serve_rm hanging on long input by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/521
* Use worker_cls when vLLM version > 0.6.4.post1 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/540
* [Relanding] Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/536
* Upgrade Transformers version xiaoxigua999 fzyzcjy
* Cleanup the import modules in https://github.com/OpenRLHF/OpenRLHF/commit/a82eac218cd176c3475f30a1e0d258f216deff2d xiaoxigua999 HollowMan6


**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.1...v0.5.2

0.5.1

Highlights

* Add reinforce algorithm in train_ppo.py and train_ppo_ray.py by xiaoxigua999 LSX-Sneakerprogrammer in https://github.com/OpenRLHF/OpenRLHF/pull/513

What's Changed
* fix interactive_chat by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/512
* Allow arbitrary number of vllm engines by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/511
* [Fix timeout in request_api_wrapper](https://github.com/OpenRLHF/OpenRLHF/commit/07c34b303e86fcdffa755e9e25c28cca5b44c675) xiaoxigua999
* [update docs for trainers](https://github.com/OpenRLHF/OpenRLHF/commit/d22b6d55d3f592b84f186348107b25c71328165a) xiaoxigua999
* Raise warning on faulty input template by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/518

New Contributors
* cemiu and LSX-Sneakerprogrammer made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/512

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.0...v0.5.1

0.5.0

Highlights
* 2~3x PPO training performance via `sending all prompts to vllm` zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/507

What's Changed
* Support PRM with soft labels and change PRM dataset format by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/502
* fix interactive chat by LYMDLUT in https://github.com/OpenRLHF/OpenRLHF/pull/505

New Contributors
* LYMDLUT made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/505

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.6...v0.5.0

Page 4 of 11

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.