Changes
- Upgraded vLLM to v0.4.1 mgerstgrasser wuxibin89 hijkzzz
- Upgraded Transformers to v4.40.1 and DeepSpeed to v0.14.0 hijkzzz
- Fixed typo in train_ppo_ray.py mickelliu
- Fixed mismatch size output_state_dict(148) and state_dict(149) in model saving hijkzzz
- Added support for --colocate_actor_ref and --colocate_critic_reward in train_ppo_ray.py hijkzzz
- Added support for Ray PPO reward ref models offloading hijkzzz