Openrlhf

Latest version: v0.6.3.post2

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 11

0.4.5

What's Changed
* [BUG] fix _max_steps not initialized bug by BeingGod in https://github.com/OpenRLHF/OpenRLHF/pull/458
* fixed the missing value_head_prefix by ChenmienTan in https://github.com/OpenRLHF/OpenRLHF/pull/457
* Support remote_rm_fn when using packing_samples in ppo by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/461
* Change pg_options param into backend_options in _new_process_group_helper for PyTorch version greater than 2.6 by HollowMan6 in https://github.com/OpenRLHF/OpenRLHF/pull/462
* Move the n_samples_per_prompt into replay buffer by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/463
* Add PRM training with hard estimation by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/442

New Contributors
* BeingGod made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/458
* ChenmienTan made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/457
* HollowMan6 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/462

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.4...v0.4.5

0.4.4

Highlights
* Support packing_samples for ppo with ray by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/449 (1.5 ~ 2x performance)

What's Changed
* Add context parallel to reward model by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/444
* Fix lm_head.weight in save_model by zmzhang2000 in https://github.com/OpenRLHF/OpenRLHF/pull/445
* Fix output of packing data of RewardModel and CriticModel by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/447
* fix bug in CriticModel by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/448
* add tensorboard for local use by catqaq in https://github.com/OpenRLHF/OpenRLHF/pull/451

New Contributors
* zmzhang2000 made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/445

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.3...v0.4.4

0.4.3

What's Changed
* only import bitsandbytes when necessary by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/438
* Add context parallel to DPO by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/439
* [update patch_for_block_diag_attn](https://github.com/OpenRLHF/OpenRLHF/commit/3bc2ddb48e27a74dde32cd706faf536cf50a4514) xiaoxigua999
* [added example for ring dpo](https://github.com/OpenRLHF/OpenRLHF/commit/5c562c27f4c66773039cbd3f155ab7bfda3bc2d4) xiaoxigua999

New Contributors
* zhuzilin made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/438

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.2...v0.4.3

0.4.2

What's Changed

* Added makedirs before writing in batch_inference by tongyx361 in https://github.com/OpenRLHF/OpenRLHF/pull/417
* Added feature of load_from_disk to utils.py by tongyx361 in https://github.com/OpenRLHF/OpenRLHF/pull/425
* Fixed logging steps bug visionxyz xiaoxigua999

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.1...v0.4.2

0.4.1

What's Changed
* Rename wandb args in scripts by coding-famer in https://github.com/OpenRLHF/OpenRLHF/pull/396
* Speed Up Data Processing by Using Multi-Processing in Dataset.map by Ricardokevins and xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/412
* Update link to code in readme by coding-famer in https://github.com/OpenRLHF/OpenRLHF/pull/414
* Fixed `input_template` for Iterative DPO and Rejection Sampling xiaoxigua999
* Fixed `SFTDataset` for Continue Pretrain xiaoxigua999

New Contributors
* coding-famer made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/396
* Ricardokevins made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/412

**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.4.0...v0.4.1

0.4.0

Changes

- Added support for checkpointing, including states for Optimizer, Model, Scheduler, and DataLoader. xiaoxigua999
- Added support for the Remote Reward Model. catqaq xiaoxigua999
- Set `add_special_tokens=False` in the tokenizer. xiaoxigua999 ZhaofengWu
- Added `learning rate` in the logs xiaoxigua999

Page 5 of 11

Releases

Has known vulnerabilities

Previous Next

Openrlhf

Page 5 of 11

0.4.5

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0

Page 5 of 11

Links

Releases