Verl

Latest version: v0.3.0.post0

Safety actively analyzes 723217 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.3.0.post0

Highlights

**New algorithms and recipes**
- Vision language reasoning with qwen2.5-vl
- PRIME, RLOO, remax
- FIRE sampling algorithm, math-verify rewards

**Engine**
- sglang integration is available for preview. Blazing fast! Please try and give us feedbacks!

--actor_rollout_ref.rollout.name='sglang'

- Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
- vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine!

actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \


Hardware:
- AMD support is available for vllm and FSDP backend

Docs:
- tutorial for distributed training setup, debugging, and the programming model

Roadmap for Q2: https://github.com/volcengine/verl/issues/710. Contributions are welcome!

Changelog

**New Features**
**Algorithm Support**
- Support for `extra_info` in reward calculation
- RLOO advantage estimator
- PRIME algorithm (recipe and baseline)
- Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
- Math-Verify Support
- Support for GRPO with Megatron backend
- Added FIRE sampling in rollout
- Replaced `DataLoader` with `StatefulDataLoader` for checkpoint resuming
- Support for external reward function loading

**Performance Improvements**
- Support for SGLang as a rollout engine
- Support for Ulysses sequence parallel (transformers >= 0.48)
- Support offloading parameters and optimizer during rollout
- Tracking support for vemlp and TensorBoard
- MFU (Model FLOPS Utilization) calculation for Megatron workers
- Support for AMD (ROCm kernel)
- Improved checkpoint loading (Megatron support for Llama/Qwen models)
- Remove unnecessary `torch.cuda.empty_cache()` calls
- Optimized weight loading (replaced custom VLLM loader with `model.load_weights`)

---

**Bug Fixes**
- Fixed wrong args description
- Fixed Gemma2 example and NGC Dockerfile
- Fixed offload/load optimizer implementation
- Fixed VLLM documentation links
- Fixed typos and spelling errors
- Fixed evaluation file path in Remax training scripts
- Fixed OOM when resuming from checkpoint
- Fixed position embedding for Qwen2.5-VL
- Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
- Fixed FSDP checkpoint loading
- Fixed SGLang rollout under multi-node
- Fixed Python environment issues in installation
- Fixed validation batch repeat before feeding into rollout

---

**Deprecations and Breaking Changes**
- Deprecated `val_batch_size`
- Removed redundant config parameters
- Reverted RLHFDataset truncation config

---

**Improvements**
**Documentation**
- Added Ray on Slurm example
- Added FAQ for VLLM illegal memory access
- Added distributed training docs (RLOO, VolcEngine)
- Updated VLLM (>=0.7, >=0.8) documentation
- Added meetup info, blogs, and project references
- Improved Slurm example parameters
- Added multi-node training and debug tutorial

**Tooling & CI/CD**
- Added Dependabot action
- Added secrets scan action
- Added CI timeout and auto-cancel previous CI runs
- Added e2e_ascend CI
- Improved dataset handling in CI

**Miscellaneous**
- Added assertion checks for PPO mini-batch size
- Improved logging (SwanLab integration)
- Pre-check resource pool availability to prevent hangs
- Added tqdm progress bar for RayPPOTrainer
- Skip special tokens in processing
- Support for faster model downloads from ModelScope
- Added Dockerfile for AWS SageMaker

---

New Contributors
This new release is contributed by 60 contributors, of which 47 are new contributors!
AnselCmy BASARANOMO BaiqingL BeSkyer BearBiscuit05 CajZella Django-Jiang DolbyUUU ETOgaosion HaoshengZou ISEEKYAN Kunlun-Zhu PeterSH6 PzySeere Raf-Chen WillemJiang Yifan-Song793 ZSL98 Zeetc ZefanW Zeyi-Lin caaatch22 celestialli danielz02 dependabot dirtyDan0 eltociear eric-haibin-lin fyqqyf gameofdimension ganler haoy-zzz hiyouga hongpeng-guo iceflame89 jayl940712 kinman0224 laonahongchen liudayuan-carrot maksimstw mi804 minleminzui nomadlx none0663 nwiad ocss884 pat-jj thomZ1 tongyx361 uygnef vermouth1992 wangchengnuo wuxibin89 xffxff yaguanghu yushengsu-thu yyDing1 zhanluxianshen zhr2001 zpqiu
Thank you all for making verl better!!

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2.0.post2...v0.3.0.post0

0.2.0.post2

What's Changed
* Fixed installation issues.
* Fixed the remove padding flags in the gemma example.

New Contributors
* xffxff made their first contribution in https://github.com/volcengine/verl/pull/281

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2...v0.2.0.post2

0.2

Highlights

New algorithms and features
- [GRPO](https://github.com/volcengine/verl/tree/v0.2/examples/grpo_trainer)
- [ReMax](https://github.com/volcengine/verl/tree/v0.2/examples/remax_trainer)
- [REINFORCE++](https://github.com/volcengine/verl/pull/228/files)
- Checkpoint manager for FSDP backend
- [Sandbox for reward verification](https://github.com/volcengine/verl/tree/v0.2/verl/workers/reward_manager/prime.py) and scoring in PRIME

Performance optimization:
- Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#enable-remove-padding-sequence-packing)

actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True

- Dynamic batch size. Significant throughput increase for variable length sequences. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#tuning-for-dynamic-batch-size) and [example](https://github.com/volcengine/verl/blob/main/examples/ppo_trainer/run_qwen2-7b_seq_balance.sh)

actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu

- Sequence parallelism for long context training. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#ulysses-sequence-parallel-for-long-context-training) and [example](https://github.com/volcengine/verl/blob/v0.2/examples/ppo_trainer/run_deepseek7b_llm_sp2.sh)

actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size

- vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. [Documentation](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.7.md)

actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False

- [Liger-kernel integration](https://github.com/volcengine/verl/blob/v0.2/examples/sft/gsm8k/run_qwen_05_sp2_liger.sh) for SFT. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#ligerkernel-for-sft)

model.use_liger=True


Changelog

New Features

1. **Algorithm Support:**
- Added support for **GRPO algorithm** (124).
- Implemented **REINFORCE++ algorithm** (228).
- Added **ReMax algorithm** (234)

2. **Performance Improvements:**
- Enabled **dynamic batch size** support (118).
- Added **meta device initialization and parallel load for FSDP** to avoid OOMs during init (123).
- Improved **gradient accumulation in sequence balance** (141).
- Added **ref/RM offload support** (121).
- Added **LoRA support for SFT** (127).
- feat: spport **rmpad/data-packing** in FSDP with transformers (91)
- Liger kernel integration (133)

3. **Experiment Tracking:**
- Integrated **SwanLab** for experiment tracking with online/offline mode and local dashboard support (218).
- Added **Mlflow support** (74).

---

Bug Fixes

1. **Critical Fixes:**
- Fixed **checkpoint save with existing directories** (174).
- Fixed **incorrect response_attention_mask in vLLM rollout** (213).
- Fixed **gradient accumulation loss value** (102).
- Fixed **reward model issues with TokenClassification models** (99).

3. **Code Fixes:**
- Fixed **redundant non_zero_mask** (152).
- Fixed **validation dp_size** (90).
- Fixed **response_mask index** (60).

---

Improvements

1. **Performance:**
- Improved **memory efficiency in logprobs_from_logits_v2** (220).
- Enabled **multiprocess dataloader in SFT trainer** (122).
- Added **MFU calculation support** (117).


4. **Miscellaneous:**
- Added **option to log validation generations to wandb** (177).

---

Deprecations and Breaking Changes

1. **Breaking Changes:**
- Changed **micro_batch_size to micro_batch_size_per_gpu** (136).
- Removed **ray.remote on workers to allow inheritance** (61).
- Refactored **old_log_prob into a separate function** (129).

---

Contributors
A big thank you to all the contributors who made this release possible:
zhanluxianshen xingyaoww fzyzcjy emergenz openhands-agent ZSL98 YSLIU627 ZefanW corbt jaysonfrancis hiyouga Jiayi-Pan hongpeng-guo eltociear chujiezheng PanAndy zwhe99 pcmoritz huiyeruzhou VPeterV uygnef zhiqi-0 ExtremeViscent liziniu nch0w Cppowboy TonyLianLong 4332001876 tyler-romero ShaohonChen kinman0224 willem-bd bebetterest WeiXiongUST dignfei


---
Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.1...v0.2

0.1

What's Changed
* [misc] feat: update tutorial for opensource version by PeterSH6 in https://github.com/volcengine/verl/pull/4
* [misc] fix: vllm gpu executor issue when world_size is 1 and typo in doc by PeterSH6 in https://github.com/volcengine/verl/pull/9
* [ci] feat: add test files for ray hybrid programming model by PeterSH6 in https://github.com/volcengine/verl/pull/23
* [chore] remove unnecessary updating of `_worker_names` by kevin85421 in https://github.com/volcengine/verl/pull/19
* [misc] feat: add gemma example for small scale debug and fix gradient checkpoint in critic by PeterSH6 in https://github.com/volcengine/verl/pull/27
* [misc] fix issue in hf_weight_loader and fix typo in doc by PeterSH6 in https://github.com/volcengine/verl/pull/30
* [ci] test lint ci and lint tests dir by PeterSH6 in https://github.com/volcengine/verl/pull/28
* [example] fix: fix math circular dependency by eric-haibin-lin in https://github.com/volcengine/verl/pull/31
* [example] fix: make wandb optional dependency. allow extra args in existing scripts by eric-haibin-lin in https://github.com/volcengine/verl/pull/32
* [docs] feat: add related publications by eric-haibin-lin in https://github.com/volcengine/verl/pull/35
* [tokenizer] feat: support tokenizers whose pad_token_id is none by eric-haibin-lin in https://github.com/volcengine/verl/pull/36
* [rollout] feat: support vLLM v0.6.3 and fix hf rollout import issue by PeterSH6 in https://github.com/volcengine/verl/pull/33
* [distro] feat: add docker support by eric-haibin-lin in https://github.com/volcengine/verl/pull/41
* [example] add a split placement tutorial by PeterSH6 in https://github.com/volcengine/verl/pull/43
* [doc] add a new quickstart section by PeterSH6 in https://github.com/volcengine/verl/pull/44
* [BREAKING][core] move single_controller into verl directory by PeterSH6 in https://github.com/volcengine/verl/pull/45

New Contributors
* eric-haibin-lin made their first contribution in https://github.com/volcengine/verl/pull/31

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.1rc...v0.1

0.1rc

What's Changed
* [init] feat: first commit for open source
* [doc] feat: fix typo and delete deprecated config element by PeterSH6 in https://github.com/volcengine/verl/pull/2
* [misc] fix: resolve pypi missing directory by PeterSH6 in https://github.com/volcengine/verl/pull/3

Credit To
PeterSH6 vermouth1992 zw0610 wuxibin89 YipZLF namizzz pengyanghua eric-haibin-lin Meteorix and others in Seed Foundation MLSys Team

Links

Releases

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.