What's Changed * [BUG] fix _max_steps not initialized bug by BeingGod in https://github.com/OpenRLHF/OpenRLHF/pull/458 * fixed the missing value_head_prefix by ChenmienTan in https://github.com/OpenRLHF/OpenRLHF/pull/457 * Support remote_rm_fn when using packing_samples in ppo by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/461 * Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/462 * Move the n_samples_per_prompt into replay buffer by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/463 * Add PRM training with hard estimation by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/442
New Contributors * BeingGod made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/458 * ChenmienTan made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/457 * HollowMan6 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/462
Highlights * Support packing_samples for ppo with ray by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/449 (1.5 ~ 2x performance)
What's Changed * Add context parallel to reward model by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/444 * Fix lm_head.weight in save_model by zmzhang2000 in https://github.com/OpenRLHF/OpenRLHF/pull/445 * Fix output of packing data of RewardModel and CriticModel by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/447 * fix bug in CriticModel by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/448 * add tensorboard for local use by catqaq in https://github.com/OpenRLHF/OpenRLHF/pull/451
New Contributors * zmzhang2000 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/445
What's Changed * only import bitsandbytes when necessary by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/438 * Add context parallel to DPO by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/439 * [update patch_for_block_diag_attn](https://github.com/OpenRLHF/OpenRLHF/commit/3bc2ddb48e27a74dde32cd706faf536cf50a4514) xiaoxigua999 * [added example for ring dpo](https://github.com/OpenRLHF/OpenRLHF/commit/5c562c27f4c66773039cbd3f155ab7bfda3bc2d4) xiaoxigua999
New Contributors * zhuzilin made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/438
* Added makedirs before writing in batch_inference by tongyx361 in https://github.com/OpenRLHF/OpenRLHF/pull/417 * Added feature of load_from_disk to utils.py by tongyx361 in https://github.com/OpenRLHF/OpenRLHF/pull/425 * Fixed logging steps bug visionxyz xiaoxigua999
What's Changed * Rename wandb args in scripts by coding-famer in https://github.com/OpenRLHF/OpenRLHF/pull/396 * Speed Up Data Processing by Using Multi-Processing in Dataset.map by Ricardokevins and xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/412 * Update link to code in readme by coding-famer in https://github.com/OpenRLHF/OpenRLHF/pull/414 * Fixed `input_template` for Iterative DPO and Rejection Sampling xiaoxigua999 * Fixed `SFTDataset` for Continue Pretrain xiaoxigua999
New Contributors * coding-famer made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/396 * Ricardokevins made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/412
- Added support for checkpointing, including states for Optimizer, Model, Scheduler, and DataLoader. xiaoxigua999 - Added support for the Remote Reward Model. catqaq xiaoxigua999 - Set `add_special_tokens=False` in the tokenizer. xiaoxigua999 ZhaofengWu - Added `learning rate` in the logs xiaoxigua999