Changes
- Added support for checkpointing, including states for Optimizer, Model, Scheduler, and DataLoader. xiaoxigua999
- Added support for the Remote Reward Model. catqaq xiaoxigua999
- Set `add_special_tokens=False` in the tokenizer. xiaoxigua999 ZhaofengWu
- Added `learning rate` in the logs xiaoxigua999