Verl

Latest version: v0.3.0.post1

Safety actively analyzes 723650 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.3.0.post1

This release include fixes for sequence parallelism and sglang:

- Fixed ulysses sequence parallel issue, which may hang with specific kv head num https://github.com/volcengine/verl/pull/850
- SGLang stability & memory improvements https://github.com/volcengine/verl/pull/773 https://github.com/volcengine/verl/pull/756

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.3.0.post0...v0.3.0.post1

0.3.0.post0

Highlights

**New algorithms and recipes**
- Vision language reasoning with qwen2.5-vl 386
- PRIME, RLOO, remax 753 234 341
- FIRE sampling algorithm, math-verify rewards 545 683

**Engine**
- sglang integration is available for preview (single node with FSDP). Blazing fast! Please try and give us feedbacks! We recommend using verl main branch for continuous slang related fixes and improvement upon feedbacks.

--actor_rollout_ref.rollout.name='sglang'

- Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
- vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine! Please remember to enable cuda graph with the following option. There were memory leak issues before vllm v0.8.2, we recommend either using vllm v0.6.3 or v0.8.2.

actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \


Hardware:
- AMD support is available for vllm and FSDP backend. Getting started one pager is [here](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst)

Docs:
- tutorial for distributed training setup, debugging, and the programming model

Roadmap for Q2: https://github.com/volcengine/verl/issues/710. Contributions are welcome!

Changelog

**New Features**
**Algorithm Support**
- Support for `extra_info` in reward calculation
- RLOO advantage estimator
- PRIME algorithm (recipe and baseline)
- Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
- Math-Verify Support
- Support for GRPO with Megatron backend
- Added FIRE sampling in rollout
- Replaced `DataLoader` with `StatefulDataLoader` for checkpoint resuming
- Support for external reward function loading

**Performance Improvements**
- Support for SGLang as a rollout engine
- Support for Ulysses sequence parallel (transformers >= 0.48)
- Support offloading parameters and optimizer during rollout
- Tracking support for vemlp and TensorBoard
- MFU (Model FLOPS Utilization) calculation for Megatron workers
- Support for AMD (ROCm kernel)
- Improved checkpoint loading (Megatron support for Llama/Qwen models)
- Remove unnecessary `torch.cuda.empty_cache()` calls
- Optimized weight loading (replaced custom VLLM loader with `model.load_weights`)

---

**Bug Fixes**
- Fixed wrong args description
- Fixed Gemma2 example and NGC Dockerfile
- Fixed offload/load optimizer implementation
- Fixed VLLM documentation links
- Fixed typos and spelling errors
- Fixed evaluation file path in Remax training scripts
- Fixed OOM when resuming from checkpoint
- Fixed position embedding for Qwen2.5-VL
- Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
- Fixed FSDP checkpoint loading
- Fixed SGLang rollout under multi-node
- Fixed Python environment issues in installation
- Fixed validation batch repeat before feeding into rollout

---

**Deprecations and Breaking Changes**
- Deprecated `val_batch_size`
- Removed redundant config parameters
- Reverted RLHFDataset truncation config

---

**Improvements**
**Documentation**
- Added Ray on Slurm example
- Added FAQ for VLLM illegal memory access
- Added distributed training docs (RLOO, VolcEngine)
- Updated VLLM (>=0.7, >=0.8) documentation
- Added meetup info, blogs, and project references
- Improved Slurm example parameters
- Added multi-node training and debug tutorial

**Tooling & CI/CD**
- Added Dependabot action
- Added secrets scan action
- Added CI timeout and auto-cancel previous CI runs
- Added e2e_ascend CI
- Improved dataset handling in CI

**Miscellaneous**
- Added assertion checks for PPO mini-batch size
- Improved logging (SwanLab integration)
- Pre-check resource pool availability to prevent hangs
- Added tqdm progress bar for RayPPOTrainer
- Skip special tokens in processing
- Support for faster model downloads from ModelScope
- Added Dockerfile for AWS SageMaker

---

New Contributors
This new release is contributed by 60 contributors, of which 47 are new contributors!
AnselCmy BASARANOMO BaiqingL BeSkyer BearBiscuit05 CajZella Django-Jiang DolbyUUU ETOgaosion HaoshengZou ISEEKYAN Kunlun-Zhu PeterSH6 PzySeere Raf-Chen WillemJiang Yifan-Song793 ZSL98 Zeetc ZefanW Zeyi-Lin caaatch22 celestialli danielz02 dependabot dirtyDan0 eltociear eric-haibin-lin fyqqyf gameofdimension ganler haoy-zzz hiyouga hongpeng-guo iceflame89 jayl940712 kinman0224 laonahongchen liudayuan-carrot maksimstw mi804 minleminzui nomadlx none0663 nwiad ocss884 pat-jj thomZ1 tongyx361 uygnef vermouth1992 wangchengnuo wuxibin89 xffxff yaguanghu yushengsu-thu yyDing1 zhanluxianshen zhr2001 zpqiu
Thank you all for making verl better!!

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2.0.post2...v0.3.0.post0

Known issues tracker: https://github.com/volcengine/verl/issues/827

0.2.0.post2

What's Changed
* Fixed installation issues.
* Fixed the remove padding flags in the gemma example.

New Contributors
* xffxff made their first contribution in https://github.com/volcengine/verl/pull/281

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2...v0.2.0.post2

0.2

Highlights

New algorithms and features
- [GRPO](https://github.com/volcengine/verl/tree/v0.2/examples/grpo_trainer)
- [ReMax](https://github.com/volcengine/verl/tree/v0.2/examples/remax_trainer)
- [REINFORCE++](https://github.com/volcengine/verl/pull/228/files)
- Checkpoint manager for FSDP backend
- [Sandbox for reward verification](https://github.com/volcengine/verl/tree/v0.2/verl/workers/reward_manager/prime.py) and scoring in PRIME

Performance optimization:
- Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#enable-remove-padding-sequence-packing)

actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True

- Dynamic batch size. Significant throughput increase for variable length sequences. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#tuning-for-dynamic-batch-size) and [example](https://github.com/volcengine/verl/blob/main/examples/ppo_trainer/run_qwen2-7b_seq_balance.sh)

actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu

- Sequence parallelism for long context training. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#ulysses-sequence-parallel-for-long-context-training) and [example](https://github.com/volcengine/verl/blob/v0.2/examples/ppo_trainer/run_deepseek7b_llm_sp2.sh)

actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size

- vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. [Documentation](https://github.com/volcengine/verl/blob/main/docs/README_vllm0.7.md)

actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False

- [Liger-kernel integration](https://github.com/volcengine/verl/blob/v0.2/examples/sft/gsm8k/run_qwen_05_sp2_liger.sh) for SFT. [Documentation](https://verl.readthedocs.io/en/latest/perf/perf_tuning.html#ligerkernel-for-sft)

model.use_liger=True


Changelog

New Features

1. **Algorithm Support:**
- Added support for **GRPO algorithm** (124).
- Implemented **REINFORCE++ algorithm** (228).
- Added **ReMax algorithm** (234)

2. **Performance Improvements:**
- Enabled **dynamic batch size** support (118).
- Added **meta device initialization and parallel load for FSDP** to avoid OOMs during init (123).
- Improved **gradient accumulation in sequence balance** (141).
- Added **ref/RM offload support** (121).
- Added **LoRA support for SFT** (127).
- feat: spport **rmpad/data-packing** in FSDP with transformers (91)
- Liger kernel integration (133)

3. **Experiment Tracking:**
- Integrated **SwanLab** for experiment tracking with online/offline mode and local dashboard support (218).
- Added **Mlflow support** (74).

---

Bug Fixes

1. **Critical Fixes:**
- Fixed **checkpoint save with existing directories** (174).
- Fixed **incorrect response_attention_mask in vLLM rollout** (213).
- Fixed **gradient accumulation loss value** (102).
- Fixed **reward model issues with TokenClassification models** (99).

3. **Code Fixes:**
- Fixed **redundant non_zero_mask** (152).
- Fixed **validation dp_size** (90).
- Fixed **response_mask index** (60).

---

Improvements

1. **Performance:**
- Improved **memory efficiency in logprobs_from_logits_v2** (220).
- Enabled **multiprocess dataloader in SFT trainer** (122).
- Added **MFU calculation support** (117).


4. **Miscellaneous:**
- Added **option to log validation generations to wandb** (177).

---

Deprecations and Breaking Changes

1. **Breaking Changes:**
- Changed **micro_batch_size to micro_batch_size_per_gpu** (136).
- Removed **ray.remote on workers to allow inheritance** (61).
- Refactored **old_log_prob into a separate function** (129).

---

Contributors
A big thank you to all the contributors who made this release possible:
zhanluxianshen xingyaoww fzyzcjy emergenz openhands-agent ZSL98 YSLIU627 ZefanW corbt jaysonfrancis hiyouga Jiayi-Pan hongpeng-guo eltociear chujiezheng PanAndy zwhe99 pcmoritz huiyeruzhou VPeterV uygnef zhiqi-0 ExtremeViscent liziniu nch0w Cppowboy TonyLianLong 4332001876 tyler-romero ShaohonChen kinman0224 willem-bd bebetterest WeiXiongUST dignfei


---
Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.1...v0.2

0.1

What's Changed
* [misc] feat: update tutorial for opensource version by PeterSH6 in https://github.com/volcengine/verl/pull/4
* [misc] fix: vllm gpu executor issue when world_size is 1 and typo in doc by PeterSH6 in https://github.com/volcengine/verl/pull/9
* [ci] feat: add test files for ray hybrid programming model by PeterSH6 in https://github.com/volcengine/verl/pull/23
* [chore] remove unnecessary updating of `_worker_names` by kevin85421 in https://github.com/volcengine/verl/pull/19
* [misc] feat: add gemma example for small scale debug and fix gradient checkpoint in critic by PeterSH6 in https://github.com/volcengine/verl/pull/27
* [misc] fix issue in hf_weight_loader and fix typo in doc by PeterSH6 in https://github.com/volcengine/verl/pull/30
* [ci] test lint ci and lint tests dir by PeterSH6 in https://github.com/volcengine/verl/pull/28
* [example] fix: fix math circular dependency by eric-haibin-lin in https://github.com/volcengine/verl/pull/31
* [example] fix: make wandb optional dependency. allow extra args in existing scripts by eric-haibin-lin in https://github.com/volcengine/verl/pull/32
* [docs] feat: add related publications by eric-haibin-lin in https://github.com/volcengine/verl/pull/35
* [tokenizer] feat: support tokenizers whose pad_token_id is none by eric-haibin-lin in https://github.com/volcengine/verl/pull/36
* [rollout] feat: support vLLM v0.6.3 and fix hf rollout import issue by PeterSH6 in https://github.com/volcengine/verl/pull/33
* [distro] feat: add docker support by eric-haibin-lin in https://github.com/volcengine/verl/pull/41
* [example] add a split placement tutorial by PeterSH6 in https://github.com/volcengine/verl/pull/43
* [doc] add a new quickstart section by PeterSH6 in https://github.com/volcengine/verl/pull/44
* [BREAKING][core] move single_controller into verl directory by PeterSH6 in https://github.com/volcengine/verl/pull/45

New Contributors
* eric-haibin-lin made their first contribution in https://github.com/volcengine/verl/pull/31

**Full Changelog**: https://github.com/volcengine/verl/compare/v0.1rc...v0.1

0.1rc

What's Changed
* [init] feat: first commit for open source
* [doc] feat: fix typo and delete deprecated config element by PeterSH6 in https://github.com/volcengine/verl/pull/2
* [misc] fix: resolve pypi missing directory by PeterSH6 in https://github.com/volcengine/verl/pull/3

Credit To
PeterSH6 vermouth1992 zw0610 wuxibin89 YipZLF namizzz pengyanghua eric-haibin-lin Meteorix and others in Seed Foundation MLSys Team

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.