中文版
新特性
1. GRPO支持多vLLM/lmdeploy数据并行采样,支持异步采样,参考[这里](https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo)。多模态GRPO实验记录参考[这里](https://swift.readthedocs.io/zh-cn/latest/BestPractices/GRPO%E5%A4%9A%E6%A8%A1%E6%80%81%E8%AE%AD%E7%BB%83.html)。
2. `swift deploy` infer_backend为pt时支持动态batch;流式推理接口修改(break change)。
3. `swift infer` infer_backend为vllm/lmdeploy支持数据并行。参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/ddp.sh)。
4. 支持moun优化器,参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muon.sh)。
新模型
1. moonshotai/Moonlight-16B-A3B-Instruct
2. LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
3. DeepSeek-V3-awq, deepseek-r1-awq
4. Baichuan-M1-14B-Instruct
新数据集
1. 多模态GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
New Features
1. GRPO supports multi-vLLM/lmdeploy data parallel sampling and asynchronous sampling. For more information, refer to [here](https://github.com/modelscope/ms-swift/tree/main/examples/train/grpo). Records of multi-modal GRPO experiments can be found [here](https://swift.readthedocs.io/zh-cn/latest/BestPractices/GRPO多模态训练.html).
2. When `swift deploy` infer_backend is set to pt, it supports dynamic batching; the streaming inference interface has been modified (breaking change).
3. When `swift infer` infer_backend is set to vllm/lmdeploy, it supports data parallelism. Refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/infer/vllm/ddp.sh).
4. Supports the muon optimizer. For more information, refer to [here](https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muon.sh).
New Models
1. moonshotai/Moonlight-16B-A3B-Instruct
2. LLM-Research/Phi-4-mini-instruct, LLM-Research/Phi-4-multimodal-instruct
3. DeepSeek-V3-awq, deepseek-r1-awq
4. Baichuan-M1-14B-Instruct
New Datasets
1. Multi-modal GRPO:
- lmms-lab/multimodal-open-r1-8k-verified
- okwinds/clevr_cogen_a_train
What's Changed
* fix setup.py by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3198
* support vllm dp by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3201
* update dataset & fix bugs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3203
* Support multiple vllms by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3202
* update distill docs by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3216
* compatible with trl0.16 by hjh0119 in https://github.com/modelscope/ms-swift/pull/3209
* support r1 awq by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3206
* fix grpo old_per_token_logps by hjh0119 in https://github.com/modelscope/ms-swift/pull/3220
* Support the generation of JanusPro models by DaozeZhang in https://github.com/modelscope/ms-swift/pull/3218
* Update the JanusPro-generation by DaozeZhang in https://github.com/modelscope/ms-swift/pull/3221
* fix load args by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3226
* update docs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3230
* Speed up GRPO by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3229
* fix docs zh by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3231
* fix deepseek_vl2 by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3233
* support moonlight by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3232
* support muon optimizer by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3234
* update docs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3243
* fix grpo npu vllm by hjh0119 in https://github.com/modelscope/ms-swift/pull/3242
* fix grpo single card by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3246
* save val_dataset by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3248
* fix grpo compat transformers==4.47.* by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3252
* grpo_countdown & fix format reward by mi804 in https://github.com/modelscope/ms-swift/pull/3269
* Support the base64 format of generated images for JanusPro by DaozeZhang in https://github.com/modelscope/ms-swift/pull/3265
* Fix typos by co63oc in https://github.com/modelscope/ms-swift/pull/3266
* compat lmdeploy 0.7 by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3256
* fix lmdeploy by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3274
* GRPO+LMDeploy 0.7 by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3277
* Support max memory by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3282
* add lmdeploy dp shell by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3284
* Support Baichuan-M1-14B-Instruct by DaozeZhang in https://github.com/modelscope/ms-swift/pull/3271
* fix grpo top_k by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3293
* fix lmdeploy mllm in grpo by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3296
* Update FAQ by slin000111 in https://github.com/modelscope/ms-swift/pull/3289
* fix: error when uploading model to huggingface by xavier-h-10 in https://github.com/modelscope/ms-swift/pull/3297
* add multimodal clevr exp by mi804 in https://github.com/modelscope/ms-swift/pull/3301
* update docs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3304
* [refactor] patch_vllm by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3306
* GRPO mllm script by hjh0119 in https://github.com/modelscope/ms-swift/pull/3305
* [refactor & feat] support pt dynamic batch by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3278
* Support ZeRO++ by tastelikefeet in https://github.com/modelscope/ms-swift/pull/3315
* Revert pt engine batch infer by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3316
* optimize model_type by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3318
* Fix bugs & Update docs/datasets by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3322
* fix grpo zero3 by hjh0119 in https://github.com/modelscope/ms-swift/pull/3324
* fix grpo zero3 by hjh0119 in https://github.com/modelscope/ms-swift/pull/3326
* compat vllm>=0.5.1 lmdeploy>=0.5.0 by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3332
* update external plugins by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3334
* fix generation_config by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3335
* fix check_model error by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3336
* update get_model_tokenizer_with_flash_attn by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3337
* add geoqa grpo experiment by mi804 in https://github.com/modelscope/ms-swift/pull/3344
* fix max_memory by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3347
* support phi4-multimodal by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3350
* fix:fix bugs in cosine reward of GRPO by youyc22 in https://github.com/modelscope/ms-swift/pull/3358
* Remove entry including invalid `ROADMAP` link from English & Chinese documentation by 3manifold in https://github.com/modelscope/ms-swift/pull/3357
* update docs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3349
* Support the <video> token for Ovis2 models by DaozeZhang in https://github.com/modelscope/ms-swift/pull/3364
* update docs by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3365
* add grpo openr1 multimodal experiment by mi804 in https://github.com/modelscope/ms-swift/pull/3368
* fix swift app format by Jintao-Huang in https://github.com/modelscope/ms-swift/pull/3367
New Contributors
* xavier-h-10 made their first contribution in https://github.com/modelscope/ms-swift/pull/3297
* youyc22 made their first contribution in https://github.com/modelscope/ms-swift/pull/3358
* 3manifold made their first contribution in https://github.com/modelscope/ms-swift/pull/3357
**Full Changelog**: https://github.com/modelscope/ms-swift/compare/v3.1.1...v3.2.0