Openrlhf

Latest version: v0.5.3

Safety actively analyzes 688867 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 8

0.5.3

Highlights
* Support using `AutoModelForSequenceClassification.from_pretrained` load reward model xiaoxigua999 https://github.com/OpenRLHF/OpenRLHF/commit/a9c482a3cdabc94f34f3c5670fc964d8a9f86c63
- The default `value_head` name of the reward model has been changed to `score`
* Support RLOO with per-token KL penalty by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/515

What's Changed
* Fix bug on prm trainer w.r.t no packing samples and ring attn by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/551
* Fix issue 549 ZeroDivisionError during DPO/SFT/RM/KTO/KD training eval step by MarxistZ in https://github.com/OpenRLHF/OpenRLHF/pull/561 xiaoxigua999


New Contributors
* MarxistZ made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/561

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.2...v0.5.3

0.5.2.post1

Highlights
* Support vLLM NCCL weights sync for multi-nodes RLHF by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/543

What's Changed
* Update docker container to 24.07 and vLLM to 0.6.4.post1 xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/543

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.2...v0.5.2.post1

0.5.2

Highlights
* Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/524

What's Changed
* Fix serve_rm hanging on long input by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/521
* Use worker_cls when vLLM version > 0.6.4.post1 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/540
* [Relanding] Add support when RAY_EXPERIMENTAL_NOSET_*_VISIBLE_DEVICES is set by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/536
* Upgrade Transformers version xiaoxigua999 fzyzcjy
* Cleanup the import modules in https://github.com/OpenRLHF/OpenRLHF/commit/a82eac218cd176c3475f30a1e0d258f216deff2d xiaoxigua999 HollowMan6


**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.1...v0.5.2

0.5.1

Highlights

* Add reinforce algorithm in train_ppo.py and train_ppo_ray.py by xiaoxigua999 LSX-Sneakerprogrammer in https://github.com/OpenRLHF/OpenRLHF/pull/513

What's Changed
* fix interactive_chat by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/512
* Allow arbitrary number of vllm engines by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/511
* [Fix timeout in request_api_wrapper](https://github.com/OpenRLHF/OpenRLHF/commit/07c34b303e86fcdffa755e9e25c28cca5b44c675) xiaoxigua999
* [update docs for trainers](https://github.com/OpenRLHF/OpenRLHF/commit/d22b6d55d3f592b84f186348107b25c71328165a) xiaoxigua999
* Raise warning on faulty input template by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/518

New Contributors
* cemiu and LSX-Sneakerprogrammer made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/512

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.0...v0.5.1

0.5.0

Highlights
* 2~3x PPO training performance via `sending all prompts to vllm` zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/507

What's Changed
* Support PRM with soft labels and change PRM dataset format by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/502
* fix interactive chat by LYMDLUT in https://github.com/OpenRLHF/OpenRLHF/pull/505

New Contributors
* LYMDLUT made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/505

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.6...v0.5.0

0.4.6

What's Changed
* Separate the rollout generation and advantage calculation by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/466
* remove unnecessary softmax in prm loss by catqaq in https://github.com/OpenRLHF/OpenRLHF/pull/473
* Add temperature config for train_ppo_ray by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/475
* Upload experience_maker perf status by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/474
* Support non negative kl divergence approximation by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/476
* fix packing_samples in NaiveExperienceMaker by zmzhang2000 in https://github.com/OpenRLHF/OpenRLHF/pull/484
* Replace deprecated/removed transformers.deepspeed module by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/493
* [clean get_llm_for_sequence_regression](https://github.com/OpenRLHF/OpenRLHF/commit/ed0ad2850ba6158284db089cf9bbad1d928ae1d5) xiaoxigua999
* [add --reward_clip_range and upgrade transformers](https://github.com/OpenRLHF/OpenRLHF/commit/966b416610a45db0d8b11d1538483c25ffb7dca5) xiaoxigua999
* [Fix](https://github.com/OpenRLHF/OpenRLHF/commit/db99807d8a01d529faef4fbea4594ebba00985a2) https://github.com/OpenRLHF/OpenRLHF/issues/492 [actor_world_size must be greater than vllm_num_engines](https://github.com/OpenRLHF/OpenRLHF/commit/db99807d8a01d529faef4fbea4594ebba00985a2) xiaoxigua999
* [Fix enable_prefix_caching bug](https://github.com/OpenRLHF/OpenRLHF/commit/685a9605580d1149b4763aefe5b26b7c84ced4df) xiaoxigua999
* [update ppo scripts](https://github.com/OpenRLHF/OpenRLHF/commit/0840b980c1e6ec5171657c641323a040be01a112) xiaoxigua999

New Contributors
* ZetangForward made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/477

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.5...v0.4.6

Page 1 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.