本次更新强化了PaddleNLP的基础设施,新增了Qwen2.5、Mixtral 8*22B模型并升级了Tokenizer功能,同时重命名了数据索引工具。
此外,还修复了MoE模型参数保存与加载等问题,提升了文本处理准确性,并更新了文档与测试用例。在推理性能、硬件支持及自动并行方面也进行了优化,包括支持更多模型与参数配置、多GPU推理、国产硬件支持增强以及分布式训练流程优化等。
核心变更与增强功能
1. **基础设施强化**:
- 新增Qwen2.5模型(9157 ),Mixtral 8*22B。进一步丰富模型库。
- Tokenizer功能升级,现支持加载额外解码标记added_tokens_decoder(8997 ),提升灵活性。
- 数据索引工具`tool_helpers`重命名为`fast_dataindex`(9134 ),以更直观反映其功能特性。
- 实现训练过程中数据间隔跳过的功能(8989 ),优化数据处理效率。
- **Unified Checkpoint优化**:
- 更新优化器异步保存信号(8975 ),保证保存稳定。
- 修复统一检查点中的多项问题(9082 ),确保功能正确性。
3. **问题修复**:
- 解决了MoE模型参数保存与加载的问题(9045 )。
- 修正Tokenizer中空格与特殊符号处理的不足(9010 , 9144 ),提升文本处理准确性。
4. **文档与测试更新**:
- 更新多个文档,涵盖LLM模型文档(如8990 , 8999 )及量化文档(9057 )等,确保信息的时效性与准确性。
- 新增测试用例,如针对PIR模式序列并行的测试(9015 ),强化测试覆盖度。
- 修复文档中的链接错误(如9127 ),提升用户体验。
5. **其他关键变更**:
- **推理性能优化**:
- LLM推理代码得到优化,支持更多模型与参数配置(如8986 , 8995 ),拓宽应用场景。
- 实现Qwen2_Moe多GPU推理(9121 )及wint4量化(9129 ),提升推理效率。
- 加强LLM推理对FP8与INT8的支持(如9032 , 9151 ),满足多样化精度需求。
- **硬件支持拓展**:
- 增强对DCU、XPU、MLU等国产硬件的支持(如8983 , 8504 , 9075 ),促进国产化替代。
- 优化上述硬件上的模型训练与推理性能,提升整体运算效率。
- **自动并行优化**:
- 修复训练过程中数据重复跳过的问题(8980 ),确保数据处理的正确性。
- 更新自动并行配置与检查点转换器(如8847 , 9136 ),提升并行训练的灵活性与稳定性。
- 新增损失NaN/Inf检查器(8943 ),及时发现并处理潜在数值问题。
- 优化分布式训练中的数据加载与梯度合并流程(如9120 , 9179 ),提升训练速度与稳定性。
What's Changed
* [Unified checkpoint] update optimizer async save signal by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8975
* 更正run_dpo.py文件路径 by Mangodadada in https://github.com/PaddlePaddle/PaddleNLP/pull/8952
* fix the loss base in llama_align_dygraph_dy2st_auto_bs2_bf16_DP2-MP1-… by winter-wang in https://github.com/PaddlePaddle/PaddleNLP/pull/8986
* [Bug fix] fix skip consumed_samples twice bug by zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/8980
* fix pip error in legacy benchmarks by fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/8978
* 【auto_parallel】Add checkpoint convertor by xingmingyyj in https://github.com/PaddlePaddle/PaddleNLP/pull/8847
* [llm]update finetune.md by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8990
* tool_helpers升级后可以支持32766个数据集. by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8994
* add DCU inference docs by YanhuiDua in https://github.com/PaddlePaddle/PaddleNLP/pull/8983
* [Distributed]Add loss nan/inf checker by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/8943
* 【llm】update docs by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8999
* [Feature] Fused Mixtral support by penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/8901
* [XPU] Add README.md for llama2-7b by xiguapipi in https://github.com/PaddlePaddle/PaddleNLP/pull/8979
* Add gcu llama readme by EnflameGCU in https://github.com/PaddlePaddle/PaddleNLP/pull/8950
* fix qwen model use_casual_mask by deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/9009
* [ZeroPadding] revert zero_padding 8973 by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9003
* [LLM Inference] Fix step.cu bug by yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/8995
* Refine checkpoint converter by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9001
* [Feature] fused mixtral wint4 by penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/9013
* llm inference docs by Sunny-bot1 in https://github.com/PaddlePaddle/PaddleNLP/pull/8976
* [LLM Inference] Support Qwen2_Moe Inference Model by CJ77Qi in https://github.com/PaddlePaddle/PaddleNLP/pull/8892
* fix llama3 static run by yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/8849
* [paddle inference cpu]update cpu inference by bukejiyu in https://github.com/PaddlePaddle/PaddleNLP/pull/8984
* fix the tipc ce case by wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8748
* [Cherry-pick] Add is_distributed field in sharding reshard param_meta by sneaxiy in https://github.com/PaddlePaddle/PaddleNLP/pull/9028
* [Tokenizer] Support for loading added_tokens_decoder by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/8997
* [Inference] Add a8w8(fp8) a8w8c8(int8) quant_type support by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9032
* Fix checker of nan/inf by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/9029
* [Cherry-pick] add comm buffer size (8963) by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/9031
* [Unified Checkpoint] Update async save info by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8982
* [llm]support pad to max_length & fix sp bug by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/9040
* [Bugfix] fix bias optional by penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/9037
* fix setup.py for llm inference by yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/9041
* [Inference] Add cutlass gemm dequant op by gzy19990617 in https://github.com/PaddlePaddle/PaddleNLP/pull/8909
* [Inference] update fakequant support by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9047
* add test for pir sequence parallel on llama model by liym27 in https://github.com/PaddlePaddle/PaddleNLP/pull/9015
* Fix moe save load by Meiyim in https://github.com/PaddlePaddle/PaddleNLP/pull/9045
* Update quantization.md by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9057
* 【Fix】Initialize dp degree in single GPU by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/9056
* fix bos download by westfish in https://github.com/PaddlePaddle/PaddleNLP/pull/9023
* [Inference] Update fakequant script by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9054
* [AutoParallel][PIR] Fit pir grad merge by AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/8985
* [MLU] Support rms_norm_mlu by PeiyuLau in https://github.com/PaddlePaddle/PaddleNLP/pull/8504
* [Inference] support llama3 a8w8c8_fp8 inference and cutlass_fp8_gemm by ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/8953
* [Inference] Qwen2 support fp8 inference by ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/8954
* [Version] update version info by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9060
* [NPU] Fix baichuan2-13b-chat infer by ronny1996 in https://github.com/PaddlePaddle/PaddleNLP/pull/9070
* [MLU] Fix Llama attrntion_mask in npu and mlu by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9075
* Fix the memory overflow bug of the tune_cublaslt_gemm operator by Hanyonggong in https://github.com/PaddlePaddle/PaddleNLP/pull/9076
* [Inference] Fix weight_only_int4 bug by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9073
* [Auto Parallel] fix data stream bug of dist.to_static by zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/9077
* fix hang when Flag_dataloader_use_file_descriptor=True by deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/9080
* fix llm predict install error by fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9088
* [PIR] add pir grad merge test by AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/9074
* Update readme by EnflameGCU in https://github.com/PaddlePaddle/PaddleNLP/pull/9046
* [LLM] Add tensor parallel for chatglmv2 by SevenSamon in https://github.com/PaddlePaddle/PaddleNLP/pull/9014
* [data] update tool_helpers version and add unittest by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/9093
* fix baseline because of PR8769 by fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9092
* fix use paddle.incubate.jit.inference(model) errors by chang-wenbin in https://github.com/PaddlePaddle/PaddleNLP/pull/9016
* [CI] Fix paddlepaddle install by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9102
* [LLM] fix train on npu by SylarTiaNII in https://github.com/PaddlePaddle/PaddleNLP/pull/9101
* Disable ut by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9108
* [AutoParallel] Enable CI for gradclip by JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/9059
* [Inference] Remove ceval from run_finetune by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9100
* [Bugfix] fix multi-gpu infer by penPenf28 in https://github.com/PaddlePaddle/PaddleNLP/pull/9107
* 【Inference】fix step kernel by gzy19990617 in https://github.com/PaddlePaddle/PaddleNLP/pull/9122
* [DCU] fix DCU w8a8c8 GEMM shape by YanhuiDua in https://github.com/PaddlePaddle/PaddleNLP/pull/9115
* [Inference] FP8 gemm auto-tune by ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/9094
* Open ut llama_align_dygraph_dy2st_pir_auto_grad_merge_bs2_fp32_DP1-MP1-PP1 by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9120
* [LLM Inference] Support Qwen2_Moe Inference with MultiGPU by CJ77Qi in https://github.com/PaddlePaddle/PaddleNLP/pull/9121
* [Unified Checkpoint] Fix uc lora config, fix release_grads by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/9082
* [Inference]qwen2-a8w8c8 support use_fake_parameter by ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/9109
* Add fast_ln spmd rules by From00 in https://github.com/PaddlePaddle/PaddleNLP/pull/9125
* fix pir dtype by wanghuancoder in https://github.com/PaddlePaddle/PaddleNLP/pull/9130
* Remove ring_flash_attention warning by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9119
* [DOC] Fix LLM page 404 Not Found by DrRyanHuang in https://github.com/PaddlePaddle/PaddleNLP/pull/9127
* Add hardware flops for pretraining by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9069
* [Benchmark] Fix amp level bug in some gpt tests by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9116
* [Auto Parallel] Fix ckpt_converter for auto_parallel by zhangyuqin1998 in https://github.com/PaddlePaddle/PaddleNLP/pull/9136
* [Inference] Update fakequant by lixcli in https://github.com/PaddlePaddle/PaddleNLP/pull/9140
* [DOC] Update docs by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9141
* [LLM Inference] Qwen2_Moe Support wint4 by CJ77Qi in https://github.com/PaddlePaddle/PaddleNLP/pull/9129
* add multy devices supported models by a31413510 in https://github.com/PaddlePaddle/PaddleNLP/pull/9079
* [fix] freeze 参数冗余存储 兼容shard-reshard (9067) by bo-ke in https://github.com/PaddlePaddle/PaddleNLP/pull/9148
* [Docs] Update LLM docs by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9143
* fix llm ce predict run error by fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9149
* [Tokenizer] Add replace_additional_special_tokens parameter to add_special_tokens by lvdongyi in https://github.com/PaddlePaddle/PaddleNLP/pull/9144
* [Tokenizer] Fix decode output with space in decode_token by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9010
* 【Inference】Optimize top_p kernel performance by gzy19990617 in https://github.com/PaddlePaddle/PaddleNLP/pull/9132
* [Models] Add Qwen2.5 by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9157
* Update README.md by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9160
* [Inference] FP8 dual gemm auto-tune and support compile parallelization by ckl117 in https://github.com/PaddlePaddle/PaddleNLP/pull/9151
* [AutoParallel] enable ci for dp amp clip by JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/9062
* [llm]support dpo pp by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/9039
* [Tools] Rename tool_helpers to fast_dataindex. by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9134
* [Trainer] Support skip data intervals by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/8989
* remove run_pretrain_auto_static.py CI when open PIR by fightfat in https://github.com/PaddlePaddle/PaddleNLP/pull/9177
* [Tokenizer] Enable padding_side as call time kwargs by lvdongyi in https://github.com/PaddlePaddle/PaddleNLP/pull/9161
* Revert "[Tokenizer] Enable padding_side as call time kwargs" by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9192
* [XPU] add xpu support for llama sft by tizhou86 in https://github.com/PaddlePaddle/PaddleNLP/pull/9152
* [AutoParallel] Add FLAGS_enable_fused_ffn_qkv_pass for llama by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9182
* [AutoParallel] Fix ckpt convert bug for sharding v2 by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/9179
* [Test] Disable dynamic to static test case for paddle PIR by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/9196
* Fix ppt eval hang by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/9218
* Update branch version to 3.0.0b2 by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/9220
* Update branch version to 3.0.0b2 by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/9221
* Revert "Fix ppt eval hang" by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/9229
New Contributors
* Mangodadada made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8952
* xingmingyyj made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8847
* penPenf28 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8901
* xiguapipi made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8979
* Sunny-bot1 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8976
* CJ77Qi made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8892
* lixcli made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9032
* gzy19990617 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8909
* SevenSamon made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9014
* chang-wenbin made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9016
* DrRyanHuang made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9127
* a31413510 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9079
* lvdongyi made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9144
* tizhou86 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/9152
**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v3.0.0-beta1...v3.0.0-beta2