What's Changed
* Fixed typos: advatanges -> advantages by songxxzp in https://github.com/OpenRLHF/OpenRLHF/pull/570
* Only decode the queries once for multiple remote rm by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/572
* overlap vllm init and actor/reward model loading by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/575
* correct the order of multiplication in grad acc by zhuzilin in https://github.com/OpenRLHF/OpenRLHF/pull/577
* Support ring-attention during sft phase by UbeCc in https://github.com/OpenRLHF/OpenRLHF/pull/576
* Add better error message for empty datasets by frrad in https://github.com/OpenRLHF/OpenRLHF/pull/581
* Fix nan for sft-ring when labels are all IGNORE_INDEX by UbeCc in https://github.com/OpenRLHF/OpenRLHF/pull/583
* explicitly ignore attention_mask for packing_samples. by xiaoxigua999 in https://github.com/OpenRLHF/OpenRLHF/pull/588
* [Set default grad_accum_dtype to None](https://github.com/OpenRLHF/OpenRLHF/commit/47f7cd8fc76de6d057d053251c1b55c00421cc24) xiaoxigua999
* update global batch size in eval model compatible to ring-attn-size by ShomyLiu in https://github.com/OpenRLHF/OpenRLHF/pull/590
New Contributors
* songxxzp made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/570
* UbeCc made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/576
* frrad made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/581
* ShomyLiu made their first contribution in https://github.com/OpenRLHF/OpenRLHF/pull/590
**Full Changelog**: https://github.com/OpenRLHF/OpenRLHF/compare/v0.5.3...v0.5.4