What's Changed * Support DeepSpeed universal checkpoints by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/891 * Get datasets from ModelScope use `--use_ms`. by lxline in https://github.com/OpenRLHF/OpenRLHF/pull/893 * Pop ROCR_VISIBLE_DEVICES as well when starting LLMRayActor by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/895 * Refactor ppo_trainer.py and make_experience / Support vLLM 0.8.1 by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/900 * refactor make_experience (batch forward) and advantage compute by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/902 * Fix make experience when not using ring attention by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/905 * fix: resolve UnboundLocalError when training without packed samples by mananshah99 in https://github.com/OpenRLHF/OpenRLHF/pull/906 * Fix: inconsistent vllm performance when tp > 1 by whksmo in https://github.com/OpenRLHF/OpenRLHF/pull/907
New Contributors * mananshah99 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/906 * whksmo made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/907
What's Changed * Use environ instead of args to store local_rank value by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/838 * Fix SFT packing_samples by wangcho2k in https://github.com/OpenRLHF/OpenRLHF/pull/781 * fix typo by ji-huazhong in https://github.com/OpenRLHF/OpenRLHF/pull/854 * Fix temperature for rollout and forward by dingyuan-shi xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/857 * Support full determinism option by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/868 * CUDA synchronize before empty cache when make experience by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/874 * optimize logpros from logits temperary memory by gzpan in https://github.com/OpenRLHF/OpenRLHF/pull/878 * fix: resolve vLLM engine not found issue during resume by Freder-chen in https://github.com/OpenRLHF/OpenRLHF/pull/879 * fix spelling mistake by BearBiscuit05 in https://github.com/OpenRLHF/OpenRLHF/pull/880 * remove train_ppo.py by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/882
New Contributors * wangcho2k made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/781 * gzpan made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/878 * BearBiscuit05 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/880
What's Changed * Use environ instead of args to store local_rank value by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/838 * [fix ring attention vllm generate](https://github.com/OpenRLHF/OpenRLHF/commit/cdcabf3548ed67f7454eed4fb70905ac8faa8694) xiaoxigua999
What's Changed * Fix typo in README_zh.md by yuxinzuo in https://github.com/OpenRLHF/OpenRLHF/pull/824 * Pack vLLM engines together when tp>1 in distributed RLHF by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/759 * [fix load checkpoint for vllm sleep](https://github.com/OpenRLHF/OpenRLHF/commit/c3f0776b76e078162cb973b9f814c20ecaee0248) xiaoxigua999 * [add torch.cuda.empty_cache() for offload_deepspeed_states](https://github.com/OpenRLHF/OpenRLHF/commit/b256377febe2ce61e4ff81cc118012a6c3dabf85) xiaoxigua999
New Contributors * yuxinzuo made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/824