Highlights
**New algorithms and recipes**
- Vision language reasoning with qwen2.5-vl 386
- PRIME, RLOO, remax 753 234 341
- FIRE sampling algorithm, math-verify rewards 545 683
**Engine**
- sglang integration is available for preview (single node with FSDP). Blazing fast! Please try and give us feedbacks! We recommend using verl main branch for continuous slang related fixes and improvement upon feedbacks.
--actor_rollout_ref.rollout.name='sglang'
- Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
- vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine! Please remember to enable cuda graph with the following option. There were memory leak issues before vllm v0.8.2, we recommend either using vllm v0.6.3 or v0.8.2.
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
Hardware:
- AMD support is available for vllm and FSDP backend. Getting started one pager is [here](https://github.com/volcengine/verl/blob/main/docs/amd_tutorial/amd_build_dockerfile_page.rst)
Docs:
- tutorial for distributed training setup, debugging, and the programming model
Roadmap for Q2: https://github.com/volcengine/verl/issues/710. Contributions are welcome!
Changelog
**New Features**
**Algorithm Support**
- Support for `extra_info` in reward calculation
- RLOO advantage estimator
- PRIME algorithm (recipe and baseline)
- Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
- Math-Verify Support
- Support for GRPO with Megatron backend
- Added FIRE sampling in rollout
- Replaced `DataLoader` with `StatefulDataLoader` for checkpoint resuming
- Support for external reward function loading
**Performance Improvements**
- Support for SGLang as a rollout engine
- Support for Ulysses sequence parallel (transformers >= 0.48)
- Support offloading parameters and optimizer during rollout
- Tracking support for vemlp and TensorBoard
- MFU (Model FLOPS Utilization) calculation for Megatron workers
- Support for AMD (ROCm kernel)
- Improved checkpoint loading (Megatron support for Llama/Qwen models)
- Remove unnecessary `torch.cuda.empty_cache()` calls
- Optimized weight loading (replaced custom VLLM loader with `model.load_weights`)
---
**Bug Fixes**
- Fixed wrong args description
- Fixed Gemma2 example and NGC Dockerfile
- Fixed offload/load optimizer implementation
- Fixed VLLM documentation links
- Fixed typos and spelling errors
- Fixed evaluation file path in Remax training scripts
- Fixed OOM when resuming from checkpoint
- Fixed position embedding for Qwen2.5-VL
- Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
- Fixed FSDP checkpoint loading
- Fixed SGLang rollout under multi-node
- Fixed Python environment issues in installation
- Fixed validation batch repeat before feeding into rollout
---
**Deprecations and Breaking Changes**
- Deprecated `val_batch_size`
- Removed redundant config parameters
- Reverted RLHFDataset truncation config
---
**Improvements**
**Documentation**
- Added Ray on Slurm example
- Added FAQ for VLLM illegal memory access
- Added distributed training docs (RLOO, VolcEngine)
- Updated VLLM (>=0.7, >=0.8) documentation
- Added meetup info, blogs, and project references
- Improved Slurm example parameters
- Added multi-node training and debug tutorial
**Tooling & CI/CD**
- Added Dependabot action
- Added secrets scan action
- Added CI timeout and auto-cancel previous CI runs
- Added e2e_ascend CI
- Improved dataset handling in CI
**Miscellaneous**
- Added assertion checks for PPO mini-batch size
- Improved logging (SwanLab integration)
- Pre-check resource pool availability to prevent hangs
- Added tqdm progress bar for RayPPOTrainer
- Skip special tokens in processing
- Support for faster model downloads from ModelScope
- Added Dockerfile for AWS SageMaker
---
New Contributors
This new release is contributed by 60 contributors, of which 47 are new contributors!
AnselCmy BASARANOMO BaiqingL BeSkyer BearBiscuit05 CajZella Django-Jiang DolbyUUU ETOgaosion HaoshengZou ISEEKYAN Kunlun-Zhu PeterSH6 PzySeere Raf-Chen WillemJiang Yifan-Song793 ZSL98 Zeetc ZefanW Zeyi-Lin caaatch22 celestialli danielz02 dependabot dirtyDan0 eltociear eric-haibin-lin fyqqyf gameofdimension ganler haoy-zzz hiyouga hongpeng-guo iceflame89 jayl940712 kinman0224 laonahongchen liudayuan-carrot maksimstw mi804 minleminzui nomadlx none0663 nwiad ocss884 pat-jj thomZ1 tongyx361 uygnef vermouth1992 wangchengnuo wuxibin89 xffxff yaguanghu yushengsu-thu yyDing1 zhanluxianshen zhr2001 zpqiu
Thank you all for making verl better!!
**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2.0.post2...v0.3.0.post0
Known issues tracker: https://github.com/volcengine/verl/issues/827