Changes
- Fixed reward model training when using the Huggingface ZeRO3 initialization API (for models with 70 billion+ parameters) wuxibin89
- Added support for Mixtral 8x7b balancing loss (--balancing_loss_coef) hijkzzz
- Fixed issue with vllm_engine when tp=1 wuxibin89
- Fixed ZeRO2 model saving bugs hijkzzz
- Added --grad_accum_dtype args to save memory of the CPUAdam hijkzzz