Openrlhf

Latest version: v0.5.3

Safety actively analyzes 688843 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 8 of 8

0.0.2

Changes
- Remove pad_token for llama2 hijkzzz
- Support cDPO/IPO hijkzzz
- Fix Ray RLHF sync bugs wuxibin89
- Optimized eos_indicies with `torch.argmax` li-plus
- Fix local datasets catqaq
- Fix DPO DataLoader bugs

0.0.1

Features
- A fast LLaMA2 SFT/PPO Training Framework based on DeepSpeed. hijkzzz
- Multi-nodes [training scripts](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_llama_slurm.sh) for Slurm. hijkzzz
- Support [DPO (direct-preference-optimization)](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_dpo_llama.sh). hijkzzz
- Distributed [PPO based on Ray](https://github.com/OpenLLMAI/OpenRLHF/blob/main/(./examples/scripts/train_ppo_llama_ray.sh)) for 34B+ models and 7B models on RTX4090. wuxibin89
- Support [Conditional SFT](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_conditional_llama.sh) (https://arxiv.org/abs/2308.12050). hijkzzz
- Support Wandb log (--wandb). dabney777
- Support conda env/nvidia docker. catqaq
- Support FlashAttention2 (--flash_attn). pikaqqqqqq
- Support Hot Chinese models. catqaq
- Support [GPT4 evaluation](https://github.com/OpenLLMAI/OpenRLHF/blob/main/evaluation/gpt4/README.md). hijkzzz
- Support Multiple Reward models. wuxibin89

Page 8 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.