Openrlhf

Latest version: v0.5.3

Safety actively analyzes 688867 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 8

0.3.2

Changes
- Fixed [max_model_len](https://github.com/OpenLLMAI/OpenRLHF/commit/d7d92d4ae26acd1ec067fd97cabf80d3e13bf456) openllmai0
- Added support for tokenizer chat templates in train_rm.py and reward_dataset.py. mickelliu
- Introduced the [--enable_prefix_caching] (https://github.com/OpenLLMAI/OpenRLHF/commit/bbbae8352cdbf26f990885a4e1640e82d0fbeaa8) option. openllmai0
- Added support for saving the value network with [--save_value_network] (https://github.com/OpenLLMAI/OpenRLHF/commit/e24c53fa2352964cd4638e8ceeede5bd0c6f47ce). openllmai0
- Supported specifying the number of samples per prompt with [--n_samples_per_prompt](https://github.com/OpenLLMAI/OpenRLHF/commit/46181fd0400788bb162e399934d05d5a1bbea3aa). openllmai0

0.3.1

Changes
- Supported `tokenizer.apply_chat_template` in all datasets openllmai0
- Supported vLLM 0.5.0 via `--vllm_sync_backend=gloo` in Ray PPO openllmai0
- Added Iterative DPO openllmai0
- Added [Llama3 RLHF example](https://github.com/OpenLLMAI/OpenRLHF/blob/main/examples/scripts/train_ppo_llama3_ray_colocate.sh) openllmai0

Llama3 + vLLM 0.4.2 RLHF checkpoint: https://huggingface.co/OpenLLMAI/Llama-3-8b-rlhf-100k

Evaluation

Chat-Arena-Hard
-------------------------------------------
llama-3-8b-sft | score: 5.6
llama-3-8b-rlhf-100k | score: 20.5

0.3.0

Changes
- Upgraded the PyTorch NGC container to version 24.02 openllmai0
- Downgraded DeepSpeed to version 0.13.5 openllmai0
- Fixed vLLM version to 0.4.2 (as v0.4.3+ in Ray PPO only supports Gloo backend currently) openllmai0
- Cleaned up the codebase openllmai0

The PPO training curve using Ray + vLLM v0.4.2 + NCCL

<img src="https://github.com/OpenLLMAI/OpenRLHF/assets/19810594/bf7992a9-feec-47e6-baa7-c7c48e21ece6" width="400px">

0.2.9

Changes
- Fixed OOM for --colocate_critic_reward and --colocate_actor_ref openllmai0

0.2.8

Changes
- Fixed DPO loss mask openllmai0
- Fixed vLLM generation corner case openllmai0
- Upgraded Ray and Transformers openllmai0
- Fixed typos in README.md KT313
- Added system prompt in datasets hijkzzz

0.2.7

Changes

- Added support for vLLM-v0.4.2 hijkzzz
- Added support for Jamba-v0.1 (Incompatible with vLLM-v0.4.2 now) hijkzzz
- Added LoRA configs (--lora_dropout, --target_modules) hijkzzz

Page 4 of 8

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.