Highlights
* Support using `AutoModelForSequenceClassification.from_pretrained` load reward model xiaoxigua999 https://github.com/OpenRLHF/OpenRLHF/commit/a9c482a3cdabc94f34f3c5670fc964d8a9f86c63
- The default `value_head` name of the reward model has been changed to `score`
* Support RLOO with per-token KL penalty by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/515
What's Changed
* Fix bug on prm trainer w.r.t no packing samples and ring attn by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/551
* Fix issue 549 ZeroDivisionError during DPO/SFT/RM/KTO/KD training eval step by MarxistZ in https://github.com/OpenRLHF/OpenRLHF/pull/561 xiaoxigua999
New Contributors
* MarxistZ made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/561
**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.2...v0.5.3