Features
- A fast LLaMA2 SFT/PPO Training Framework based on DeepSpeed. hijkzzz
- Multi-nodes [training scripts](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_llama_slurm.sh) for Slurm. hijkzzz
- Support [DPO (direct-preference-optimization)](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_dpo_llama.sh). hijkzzz
- Distributed [PPO based on Ray](https://github.com/OpenLLMAI/OpenRLHF/blob/main/(./examples/scripts/train_ppo_llama_ray.sh)) for 34B+ models and 7B models on RTX4090. wuxibin89
- Support [Conditional SFT](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_conditional_llama.sh) (https://arxiv.org/abs/2308.12050). hijkzzz
- Support Wandb log (--wandb). dabney777
- Support conda env/nvidia docker. catqaq
- Support FlashAttention2 (--flash_attn). pikaqqqqqq
- Support Hot Chinese models. catqaq
- Support [GPT4 evaluation](https://github.com/OpenLLMAI/OpenRLHF/blob/main/evaluation/gpt4/README.md). hijkzzz
- Support Multiple Reward models. wuxibin89