Highlights
**New algorithms and recipes**
- Vision language reasoning with qwen2.5-vl
- PRIME, RLOO, remax
- FIRE sampling algorithm, math-verify rewards
**Engine**
- sglang integration is available for preview. Blazing fast! Please try and give us feedbacks!
--actor_rollout_ref.rollout.name='sglang'
- Megatron is now upgraded to v0.11. Supporting checkpoint manager, qwen model & GRPO algorithm
- vllm upgraded to v0.8.2, much faster than vllm v0.7 & v0.6.3 during rollout with the v1 engine!
actor_rollout_ref.rollout.enforce_eager=False \
actor_rollout_ref.rollout.free_cache_engine=False \
Hardware:
- AMD support is available for vllm and FSDP backend
Docs:
- tutorial for distributed training setup, debugging, and the programming model
Roadmap for Q2: https://github.com/volcengine/verl/issues/710. Contributions are welcome!
Changelog
**New Features**
**Algorithm Support**
- Support for `extra_info` in reward calculation
- RLOO advantage estimator
- PRIME algorithm (recipe and baseline)
- Initial support for VLMs (Vision-Language Models), including Qwen2.5VL GRPO example
- Math-Verify Support
- Support for GRPO with Megatron backend
- Added FIRE sampling in rollout
- Replaced `DataLoader` with `StatefulDataLoader` for checkpoint resuming
- Support for external reward function loading
**Performance Improvements**
- Support for SGLang as a rollout engine
- Support for Ulysses sequence parallel (transformers >= 0.48)
- Support offloading parameters and optimizer during rollout
- Tracking support for vemlp and TensorBoard
- MFU (Model FLOPS Utilization) calculation for Megatron workers
- Support for AMD (ROCm kernel)
- Improved checkpoint loading (Megatron support for Llama/Qwen models)
- Remove unnecessary `torch.cuda.empty_cache()` calls
- Optimized weight loading (replaced custom VLLM loader with `model.load_weights`)
---
**Bug Fixes**
- Fixed wrong args description
- Fixed Gemma2 example and NGC Dockerfile
- Fixed offload/load optimizer implementation
- Fixed VLLM documentation links
- Fixed typos and spelling errors
- Fixed evaluation file path in Remax training scripts
- Fixed OOM when resuming from checkpoint
- Fixed position embedding for Qwen2.5-VL
- Fixed PRIME algorithm issues (filtering long prompts, padding side, xformers)
- Fixed FSDP checkpoint loading
- Fixed SGLang rollout under multi-node
- Fixed Python environment issues in installation
- Fixed validation batch repeat before feeding into rollout
---
**Deprecations and Breaking Changes**
- Deprecated `val_batch_size`
- Removed redundant config parameters
- Reverted RLHFDataset truncation config
---
**Improvements**
**Documentation**
- Added Ray on Slurm example
- Added FAQ for VLLM illegal memory access
- Added distributed training docs (RLOO, VolcEngine)
- Updated VLLM (>=0.7, >=0.8) documentation
- Added meetup info, blogs, and project references
- Improved Slurm example parameters
- Added multi-node training and debug tutorial
**Tooling & CI/CD**
- Added Dependabot action
- Added secrets scan action
- Added CI timeout and auto-cancel previous CI runs
- Added e2e_ascend CI
- Improved dataset handling in CI
**Miscellaneous**
- Added assertion checks for PPO mini-batch size
- Improved logging (SwanLab integration)
- Pre-check resource pool availability to prevent hangs
- Added tqdm progress bar for RayPPOTrainer
- Skip special tokens in processing
- Support for faster model downloads from ModelScope
- Added Dockerfile for AWS SageMaker
---
New Contributors
This new release is contributed by 60 contributors, of which 47 are new contributors!
AnselCmy BASARANOMO BaiqingL BeSkyer BearBiscuit05 CajZella Django-Jiang DolbyUUU ETOgaosion HaoshengZou ISEEKYAN Kunlun-Zhu PeterSH6 PzySeere Raf-Chen WillemJiang Yifan-Song793 ZSL98 Zeetc ZefanW Zeyi-Lin caaatch22 celestialli danielz02 dependabot dirtyDan0 eltociear eric-haibin-lin fyqqyf gameofdimension ganler haoy-zzz hiyouga hongpeng-guo iceflame89 jayl940712 kinman0224 laonahongchen liudayuan-carrot maksimstw mi804 minleminzui nomadlx none0663 nwiad ocss884 pat-jj thomZ1 tongyx361 uygnef vermouth1992 wangchengnuo wuxibin89 xffxff yaguanghu yushengsu-thu yyDing1 zhanluxianshen zhr2001 zpqiu
Thank you all for making verl better!!
**Full Changelog**: https://github.com/volcengine/verl/compare/v0.2.0.post2...v0.3.0.post0