Openrlhf

Latest version: v0.6.4

Safety actively analyzes 723954 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 11

0.5.8

Highlights
* Support hybrid engine and vllm rlhf API by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/677

What's Changed
* custom reward func for reinforced finetuning by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/707
* early_stopping conditional on beam search by cemiu in https://github.com/OpenRLHF/OpenRLHF/pull/714
* [small change] Use selective log-softmax to reduce peak vram consumption by tyler-romero in https://github.com/OpenRLHF/OpenRLHF/pull/718
* Fix vLLM instance scheduling when tp size is 1 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/721
* Send all prompts to vLLM engines zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/677

New Contributors
* tyler-romero made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/718

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.7...v0.5.8

0.5.7

What's Changed
* [misc] remove unused constant by ji-huazhong in https://github.com/OpenRLHF/OpenRLHF/pull/657
* Save by hf checkpoint by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/671
* Add support for multiturn sft training with packing_samples by UbeCc in https://github.com/OpenRLHF/OpenRLHF/pull/586
* Update README.md by hijkzzz in https://github.com/OpenRLHF/OpenRLHF/pull/699
* Add support for clearing prefix cache in vLLM by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/703
* Fix 'worker_use_ray' not found in vLLM 0.7.0 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/702
* Add support for vLLM weight sync with Ray collective communication by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/704

New Contributors
* ji-huazhong made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/657

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.6...v0.5.7

0.5.6

What's Changed
* Fix: bug in actor logit's numerical precision when bf16 is on. by Illyasville in https://github.com/OpenRLHF/OpenRLHF/pull/634
* docs: add Japanese README file by eltociear in https://github.com/OpenRLHF/OpenRLHF/pull/636
* Fix loss_mean log by xiaoxigua999 Freder-chen in https://github.com/OpenRLHF/OpenRLHF/pull/650
* Fix: In prm training, placeholder_token should be truncated if input is truncated by LinXueyuanStdio in https://github.com/OpenRLHF/OpenRLHF/pull/652
* [Fix adapter_model.safetensors for LoRA + ZeRO3](https://github.com/OpenRLHF/OpenRLHF/commit/5a06164334f23c467cbe3dcfa7ef918c9931c567) xiaoxigua999
* [add lora_combiner.py](https://github.com/OpenRLHF/OpenRLHF/commit/18010bca5ff6cfe0cde4cf654efa180e1d28afe9) UbeCc xiaoxigua999

New Contributors
* Illyasville made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/634
* LinXueyuanStdio made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/652

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.5...v0.5.6

0.5.5.post2

What's Changed

- [revert transformer/deepspeed versions](https://github.com/OpenRLHF/OpenRLHF/commit/ce5d3cc8e902b4586851c0708f29e8606883ef36) xiaoxigua999

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.5.post1...v0.5.5.post2

0.5.5.post1

What's Changed

- [Fix _validate_args](https://github.com/OpenRLHF/OpenRLHF/commit/1f95a1707ee17513bcdab816a30ab3fa46c12ee9) xiaoxigua999
- [process_experiences on gpu](https://github.com/OpenRLHF/OpenRLHF/commit/1e62a776f36c317ec40d608703ee0dfcfbfa7714) xiaoxigua999
- [bump deepspeed version](https://github.com/OpenRLHF/OpenRLHF/commit/4c330de97be29293999e2f242232252982fca1fc) xiaoxigua999

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.5...v0.5.5.post1

0.5.5

High Lights
* Fix vLLM nccl sync in single node by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/604

What's Changed
* Update batch_inference.py by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/612
* Offload training experiences to CPU memory in RLHF xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/620
* Fixing KL Divergence Precision and vllm generate timeout by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/620
* [support overlap_comm option](https://github.com/OpenRLHF/OpenRLHF/commit/5fd51011b57784a835784a55fe4c00cf3fdace3c) xiaoxigua999
* Fix 622: support string format in SFT template by Freder-chen in https://github.com/OpenRLHF/OpenRLHF/pull/623

New Contributors
* Freder-chen made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/623

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.4...v0.5.5

Page 3 of 11

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.