Changes
- Fixed [max_model_len](https://github.com/OpenLLMAI/OpenRLHF/commit/d7d92d4ae26acd1ec067fd97cabf80d3e13bf456) openllmai0
- Added support for tokenizer chat templates in train_rm.py and reward_dataset.py. mickelliu
- Introduced the [--enable_prefix_caching] (https://github.com/OpenLLMAI/OpenRLHF/commit/bbbae8352cdbf26f990885a4e1640e82d0fbeaa8) option. openllmai0
- Added support for saving the value network with [--save_value_network] (https://github.com/OpenLLMAI/OpenRLHF/commit/e24c53fa2352964cd4638e8ceeede5bd0c6f47ce). openllmai0
- Supported specifying the number of samples per prompt with [--n_samples_per_prompt](https://github.com/OpenLLMAI/OpenRLHF/commit/46181fd0400788bb162e399934d05d5a1bbea3aa). openllmai0