* 特色精调和高效对齐:提供自研极致收敛的RsLoRA+算法,大幅提升PEFT训练收敛速度以及训练效果;引入高性能生成加速到RLHF PPO算法,打破 PPO 训练中生成速度瓶颈,PPO训练性能大幅领先。
* 大模型训练提速:通用化支持 FastFNN、FusedQKV等多个大模型训练性能优化方式,大模型训练更快、更稳定。
大模型精调对齐训推优化
* 精调
* PEFT
* 新增QLoRA pipeline parallel支持 7801
* 自定义python算子,优化LoRA的前反向计算 8106
* 新增 rslora,lora+,pissa 算法 8111
* 长序列
* 新增长序列方案和模型解耦。RotaryEmbedding,LinearScalingRotaryEmbedding,NTKScalingRotaryEmbedding,DynamicNTKScalingRotaryEmbedding等。8076
* Alignment
* 新增PPO 对齐算法 7305
* 训练策略
* 新增LLaMA sequence parallel 7746
* 新增LLaMa master_grad 7658
* GPT新增auto_parallel的支持。 8160
* 新增算子
* 新增GQA 算子支持 7906
* 新增gqa fuse attention qkv 7890
* 新增SwiGLU 算子 8038
* 推理
* 新增QWenVL 的静态图推理 7808
模型新增
* 新增Deberta,Debertav2模型 8227
* deepset/deberta-v3-large-squad2
* microsoft/deberta-v2-xlarge
* microsoft/deberta-v3-base
* microsoft/deberta-v3-large
* microsoft/deberta-base
* 新增mixtral-of-experts 7803
* mistralai/Mixtral-8x7B-Instruct-v0.1
* mistralai/Mixtral-8x7B-v0.1
* 新增LLama3 8315
* meta-llama/Meta-llama-3-8b
* meta-llama/Meta-Llama-3-8B-Instruct
* meta-llama/Meta-llama-3-70b
* meta-llama/Meta-Llama-3-70B-Instruct
基础框架升级
* Trainer升级
* Trainer新增 ignore_save_lr_and_optim 参数,可以忽略保存lr scheduler以及optimizer权重 7978
* Trainer新增 Wandb 和 Tensorboard 支持。7863
* Trainer支持同时解析命令行与json文件参数 7768
* trainer新增gradient_sync_after_accumulate支持。8045
* dataloader新增cuda编译检查 8099
* AutoParallel升级
* llama 自动并行支持bf16损失 7874
* 增加refined-recompute机制7349
* 在AMP-O2策略下支持master_grad7658
* 进一步完善动静统一自动并行分布式训练基本功能7985 8114
* 新增Llama2模型基于AutoTrainer的半自动训练 7851 7885
* 新增llama的hybrid_parallel_topo_order策略。8011
* llama模型组网动静统一 8127
* 其他
* 重构download下载逻辑,支持从bos、hf hub、aistudio、model scope下载模型 7608 8020 8088
* 新增分布式训练的pipeline parallel 8051
* 适配npu的FA 8171 8210
* llama新增block_attention/cachekv quant 7649
其他支持
* 新增俄罗斯套娃(matryoshka representation learning)检索策略,节省计算和存储资源。8165
问题修复
1. 日志级别修改,并增加timelog计时日志,兼容不同设备。8261
2. 修复pipeline并行中随机初始化的shared weights不一致的问题,覆盖GPT/OPT等模型。7772
3. 关闭CI及单测中从huggingface hub下载的逻辑 7798 8198
4. 修复llm的gradio开启chat template时候重复拼接query 和 history的问题。7992
5. 修复GPT模型下载key error问题。8253
6. 修复LlamaRotaryEmbedding 7882
7. 修复allreduce dtype的问题 7876
8. 修复框架侧dev分支清理 paddle.jit.dy2static.utils_helperAPI的问题 7989
9. 修复read-data timer在ignore_data_skip=False and skip_profile_timer=False 的问题。8177
10. 修复Wandb单测问题 8066 8056
11. 修复Trainer同时解析json与命令行列表参数报错问题7860
12. 修复Gradio UI 中的推理问题 7740 7788
13. 修复 Tokenizer 相关的基础问题 7797 7870
14. 修复 custom devices上loading rng state的问题。7894
15. 修复自动并行打印BF16的loss编码错乱的问题7874
16. 采用float初始化模型,修复静态图自动并行AMP报错问题80338199
17. 修复ShardDataloader接口在PipeLine Parallelism下使用错误问题8014
18. 修复llama在custom devices的精度问题。7895
19. 修复NPU AICPU算子问题 7976
20. 修复FusedLinearWithGradAdd少传参数的问题。8178
What's Changed
* [Unified Checkpoint] Add unified checkpoint training args doc. by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7756
* [AutoParallel] Auto Trans PP to VPP by zhaoyinglia in https://github.com/PaddlePaddle/PaddleNLP/pull/7747
* Add codecov check by zjjlivein in https://github.com/PaddlePaddle/PaddleNLP/pull/7760
* [CE] Delete gpt_for_sequence_classification by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7757
* [DOC] Update trainer.md by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7761
* [Release] Change version to 2.7.0 by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7764
* [benchmark]close skip_memory_metrics for ips by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/7732
* [Release] Update release.yml to release tags by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7765
* [AutoParallel] Add Sequence Parallel for Static LLaMA by JZ-LIANG in https://github.com/PaddlePaddle/PaddleNLP/pull/7746
* [New Features] support dynamic src_length by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7740
* Fix unified_checkpoint bug by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/7770
* [DONE] aistudio, hf hub, bos update download by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/7608
* [Trainer] Fix dist dataloader eval by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7777
* [Paddle-pipelines] Update convert_files_to_dicts_splitter by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/7748
* [PEFT]fix lora model tp when existing other trainable module by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/7781
* [Paddle-Pipelines] update faiss by qingzhong1 in https://github.com/PaddlePaddle/PaddleNLP/pull/7793
* Fix shared weights sync for PipelineLayer by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/7772
* [tests] download slow by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/7798
* [INFER][LLM] Support qwen in fined grained dybatch v1 by DanGuge in https://github.com/PaddlePaddle/PaddleNLP/pull/7644
* Add CE for Distributed Hybrid Parallel by iosmers in https://github.com/PaddlePaddle/PaddleNLP/pull/7782
* add MP2-SP2-pp4-vpp2-SD2-stage1-mbs2-acc8 ce by tianhaodongbd in https://github.com/PaddlePaddle/PaddleNLP/pull/7774
* [Pretrain] Fix eval during pretrain by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7806
* pipeline parallel benchmark by zhangting2020 in https://github.com/PaddlePaddle/PaddleNLP/pull/7759
* [Bug fixes] fix br gradio by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7788
* delete useless code for write_cache_kv.cu by yuanlehome in https://github.com/PaddlePaddle/PaddleNLP/pull/7812
* [llm]support qlora pp by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/7801
* Trainer support simultaneously parse JSON files and cmd arguments. by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/7768
* [LLM] Support block_attention/cachekv quant for llama by RichardWooSJTU in https://github.com/PaddlePaddle/PaddleNLP/pull/7649
* [Bug Fix] fix paddle multipy_fwd_func warning message by BeingGod in https://github.com/PaddlePaddle/PaddleNLP/pull/7818
* [llm]fix lora by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/7824
* fused rms spmd by liuzhenhai93 in https://github.com/PaddlePaddle/PaddleNLP/pull/7830
* [Pretrain] Fix eval during pretrain by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7827
* [neural search][fix bug of evaluate.py] by ZeyuTeng96 in https://github.com/PaddlePaddle/PaddleNLP/pull/7832
* [neural search] fix the bug of reading files when calculating the recall scores by shenghwa in https://github.com/PaddlePaddle/PaddleNLP/pull/7836
* [Bug fixes] update chatglm tokenizer by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7797
* [semantic_indexing] fix bug of evaluate.py by ZeyuTeng96 in https://github.com/PaddlePaddle/PaddleNLP/pull/7843
* [faq] fix bug of evaluate.py by ZeyuTeng96 in https://github.com/PaddlePaddle/PaddleNLP/pull/7840
* [text_classification_retrieval_based] fix bug of evaluate.py by ZeyuTeng96 in https://github.com/PaddlePaddle/PaddleNLP/pull/7844
* [LLM] add Qwen-7B-Chat to PaddleNLP unit test by ziangqin-baidu in https://github.com/PaddlePaddle/PaddleNLP/pull/7823
* Support 5.2 bloom by zhoutianzi666 in https://github.com/PaddlePaddle/PaddleNLP/pull/7846
* [unified checkpoint] Fix last checkpoint save by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/7854
* [unified checkpoint] fix checkpoint names by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/7795
* [New Features]add ranks testing for test_predictor by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7800
* [Auto Parallel] Support dynamic semi-auto training in Llama2 model by haohongxiang in https://github.com/PaddlePaddle/PaddleNLP/pull/7851
* [CI] add ci approval pipelines by zjjlivein in https://github.com/PaddlePaddle/PaddleNLP/pull/7859
* [fix] fix a bug of trainer/argparser.py by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/7860
* [Improvement] fix ops improting in utils by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7865
* [Add CE] Add CE for Hybrid Parallism by iosmers in https://github.com/PaddlePaddle/PaddleNLP/pull/7817
* [Unified Checkpoint] Cherry pick empty cache. by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7868
* Add PPO training. by guoshengCS in https://github.com/PaddlePaddle/PaddleNLP/pull/7305
* Update reward_main.py by wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/7880
* Update ppo_main.py by wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/7881
* [LLM] revert benchmark codes by RichardWooSJTU in https://github.com/PaddlePaddle/PaddleNLP/pull/7871
* [LLM]support QWenVL second part by DanGuge in https://github.com/PaddlePaddle/PaddleNLP/pull/7808
* [Bug Fixes] update chatglm1 tokenizer by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7870
* 【AutoParallel】Support 'master_grad' in Llama in static auto-parallelism by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/7658
* [Bug Fix] fix slice bug in LlamaRotaryEmbedding by MarioLulab in https://github.com/PaddlePaddle/PaddleNLP/pull/7882
* 【AutoParallel】Support bf16 loss in static by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/7874
* [Bug Fix] fix allreduce tensor dtype by BeingGod in https://github.com/PaddlePaddle/PaddleNLP/pull/7876
* [CE] Add Qwen into CE process by ziangqin-baidu in https://github.com/PaddlePaddle/PaddleNLP/pull/7887
* [Hackathon 5th No.73] ToT by ErnestinaQiu in https://github.com/PaddlePaddle/PaddleNLP/pull/7660
* [CustomDevice] fix loading rng state on custom devices by SylarTiaNII in https://github.com/PaddlePaddle/PaddleNLP/pull/7894
* [LLM] fix llama precision on custom devices by SylarTiaNII in https://github.com/PaddlePaddle/PaddleNLP/pull/7895
* [AutoConfig]add benchmark scripts by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/7897
* [RELEASE] Update README.md by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7834
* add qwen benchmark by wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/7758
* [Trainer] Refactor by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7909
* [CE]add gpt sharding_v2 case by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/7914
* [Improvement] fix logger level by KB-Ding in https://github.com/PaddlePaddle/PaddleNLP/pull/7903
* RuntimeTimer for the toolkit by KB-Ding in https://github.com/PaddlePaddle/PaddleNLP/pull/7913
* [New Features] Trainer add Wandb and Tensorboard by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/7863
* [Bug Fix] Fix timer device by KB-Ding in https://github.com/PaddlePaddle/PaddleNLP/pull/7939
* [Auto Parallel] Support semi-auto trainer and fit Llama2 training by haohongxiang in https://github.com/PaddlePaddle/PaddleNLP/pull/7885
* gqa fuse attention qkv by FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/7890
* rename files and add readme for llama auto_parallel by zhiqiu in https://github.com/PaddlePaddle/PaddleNLP/pull/7944
* [Trainer] Skip some trainer test. by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/7949
* [Unified checkpoint] Turn off unified checkpoint when using sharding stage3 by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7969
* [Text Matching] Update text matching by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/7973
* 修复NPU AICPU算子问题 by NINGBENZHE in https://github.com/PaddlePaddle/PaddleNLP/pull/7976
* [Unified Checkpoint] Fix multi-node output share-folder by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7977
* Add SwiGLU operator by sneaxiy in https://github.com/PaddlePaddle/PaddleNLP/pull/7967
* [model_zoo/gpt-3] Fix bugs from PR-61236 which cleared `paddle.jit.dy2static.utils_helper` by haohongxiang in https://github.com/PaddlePaddle/PaddleNLP/pull/7989
* 【AutoParallel】Add semi autoparallel amp by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/7985
* [Trainer] ignore_save_lr_and_optim by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/7978
* [Gradio] fix llm gradio multi-turn dialogue bug by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/7992
* support GQA by zhangting2020 in https://github.com/PaddlePaddle/PaddleNLP/pull/7906
* [AutoConfig]add N1C8_resume by Difers in https://github.com/PaddlePaddle/PaddleNLP/pull/7950
* [AutoConfig]add N2C16 by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/7915
* [Unified Checkpoint] Add document by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7961
* Add SearchApi integration by SebastjanPrachovskij in https://github.com/PaddlePaddle/PaddleNLP/pull/7936
* add autotuner buffer check ce case by Difers in https://github.com/PaddlePaddle/PaddleNLP/pull/7993
* [Unified Checkpoint] Support peft model by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7691
* [DATA] Remove repeated chars during preprocessing by DrownFish19 in https://github.com/PaddlePaddle/PaddleNLP/pull/7739
* 【AutoParalle】construct model using float32 in "amp-o2" by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8033
* support the loss mask for the pretrain by wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8034
* [Mixtral] Add mixtral moe by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/7803
* [CI] fix test ptuning by zjjlivein in https://github.com/PaddlePaddle/PaddleNLP/pull/8040
* Add SwiGLU for auto Llama by From00 in https://github.com/PaddlePaddle/PaddleNLP/pull/8038
* Fix _cache_founf_inf by co63oc in https://github.com/PaddlePaddle/PaddleNLP/pull/7997
* 【AutoParallelism】fix dataloader bug and add ci for static by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8014
* fix the index_dataset with old data format by wawltor in https://github.com/PaddlePaddle/PaddleNLP/pull/8049
* Fit sharding optimization for auto parallel llama by From00 in https://github.com/PaddlePaddle/PaddleNLP/pull/8021
* Optimize the log and enable to print the number of tokens each second. by Xreki in https://github.com/PaddlePaddle/PaddleNLP/pull/7853
* 【fix】 fix TestWandbCallback by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/8056
* Fit pir flag in predictor by cyber-pioneer in https://github.com/PaddlePaddle/PaddleNLP/pull/8048
* update pp by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8059
* Revert "Fit pir flag in predictor" by zjjlivein in https://github.com/PaddlePaddle/PaddleNLP/pull/8065
* [CI]fix ci scripts for distribute by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/8063
* unify_criterion_inputs_dynamic_and_static by liuzhenhai93 in https://github.com/PaddlePaddle/PaddleNLP/pull/8053
* [Unified Checkpoint] Fix lora unittest by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8070
* fit cinn and pir flag in predictor by cyber-pioneer in https://github.com/PaddlePaddle/PaddleNLP/pull/8071
* Support hybrid_parallel_topo_order for auto parallel Llama by From00 in https://github.com/PaddlePaddle/PaddleNLP/pull/8011
* Download重构 by LOVE-YOURSELF-1 in https://github.com/PaddlePaddle/PaddleNLP/pull/8020
* [Distributed] Add dp_gradient_sync_after_accumulate by AndSonder in https://github.com/PaddlePaddle/PaddleNLP/pull/8045
* [Distributed]Add distributed config for pipeline parallel by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/8051
* [UC] Ignore optimizer when UC by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/8058
* 【fix】fix TestTensorboardCallback by greycooker in https://github.com/PaddlePaddle/PaddleNLP/pull/8066
* [BugFix]Rm overlap limit in dp & pp by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/8089
* dist dataloader: add cuda compilation check by PeiyuLau in https://github.com/PaddlePaddle/PaddleNLP/pull/8099
* Download----fix new bug by LOVE-YOURSELF-1 in https://github.com/PaddlePaddle/PaddleNLP/pull/8088
* [Bug fixes] convert min_new_token -> min_new_tokens by wj-Mcat in https://github.com/PaddlePaddle/PaddleNLP/pull/7883
* [CI]update llm_gpt loss_base for Paddle62500 by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/8107
* [dist benchmark]add llama2 with autotuner by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/8108
* [Trainer] Change num_train_epochs default value by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8113
* [BugFix] shutil.rmtree ignore_errors for shared disks between train nodes. by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8117
* qwen init bug fix by wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8120
* 【AutoParallel】Add strategy with more options by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8114
* [AutoParallel] unify llama model by deepllz in https://github.com/PaddlePaddle/PaddleNLP/pull/8127
* [benchmark]add skip_memory_metrics for ce_gpt by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/8132
* [Distributed]Fix comm_overlap config bug by ForFishes in https://github.com/PaddlePaddle/PaddleNLP/pull/8128
* Commented out autonlp test by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8110
* add rslora & lora+ by wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8111
* adapter new type promotion rule for Paddle 2.6 by zxcd in https://github.com/PaddlePaddle/PaddleNLP/pull/8079
* [benchmark]add auto_pir case by Liujie0926 in https://github.com/PaddlePaddle/PaddleNLP/pull/8144
* [Unified Checkpoint] Fix tie_weights save and load by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8137
* [BugFix] fix test_sample_generate bug by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8157
* support mc2 for mp lora. by wuhuachaocoding in https://github.com/PaddlePaddle/PaddleNLP/pull/8161
* Replace Sequence Parallel to Paddle Sequence Parallel by iosmers in https://github.com/PaddlePaddle/PaddleNLP/pull/7966
* Trainer json args-parser supports raise error by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/8163
* [Paddle-pipelines] Add pytorch retrieval model tutorials by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8159
* [sharding] Add arg of disabling sharding reduce_avg for accuracy verification by haohongxiang in https://github.com/PaddlePaddle/PaddleNLP/pull/8168
* [LoRA] add quick_lora by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8106
* fix read-data timer when ignore_data_skip=False and skip_profile_timer=False by GuoxiaWang in https://github.com/PaddlePaddle/PaddleNLP/pull/8177
* Fix FusedLinearWithGradAdd bug by MarioLulab in https://github.com/PaddlePaddle/PaddleNLP/pull/8178
* adapt to npu FA. by wuhuachaocoding in https://github.com/PaddlePaddle/PaddleNLP/pull/8171
* add long sequence strategies by WAI-clear in https://github.com/PaddlePaddle/PaddleNLP/pull/8076
* [Trainer] Saving rng state not seed. by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8185
* 【AutoParallel】Change llama in auto-parallel by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8151
* [CI] 关闭从hf下载和aistudio下载的单测 by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8198
* 【AutoParallel】Change the `dtype` of initializing the model by heavyrain-lzy in https://github.com/PaddlePaddle/PaddleNLP/pull/8199
* [Paddle-Pipelines] Add matryoshka representation learning by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8165
* update for npu. by wuhuachaocoding in https://github.com/PaddlePaddle/PaddleNLP/pull/8210
* [Paddle-pipelines] remove ._static_mode for static model by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8214
* Support sharding for auto_trainer by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/8164
* [Cherry-pick] [Distributed] Support pp non batch comm (8097) by SylarTiaNII in https://github.com/PaddlePaddle/PaddleNLP/pull/8222
* add finetune fused & add mc2 by NINGBENZHE in https://github.com/PaddlePaddle/PaddleNLP/pull/8139
* Add checkpoint_done by gongel in https://github.com/PaddlePaddle/PaddleNLP/pull/8223
* Support GQA for auto parallel by zhangbo9674 in https://github.com/PaddlePaddle/PaddleNLP/pull/8234
* bug fix for pure sharding with [fp16 + main_grad] by FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8238
* [BugFix][NPU] fix llama fa bug by tianhaodongbd in https://github.com/PaddlePaddle/PaddleNLP/pull/8237
* [AutoParallel] support GPT for auto_parallel by liym27 in https://github.com/PaddlePaddle/PaddleNLP/pull/8160
* [Cherry-pick] [LLM] add decay steps option for finetuning by SylarTiaNII in https://github.com/PaddlePaddle/PaddleNLP/pull/8251
* Pissa by wtmlon in https://github.com/PaddlePaddle/PaddleNLP/pull/8250
* Optimize llm/GPT3 performance by MarioLulab in https://github.com/PaddlePaddle/PaddleNLP/pull/8172
* [BUG] fix to_static by JunnYu in https://github.com/PaddlePaddle/PaddleNLP/pull/8194
* Add DeBERTa model by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8227
* [GPT bugs]Fix gpt download bug by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8253
* Fix timer for NPU&XPU by KB-Ding in https://github.com/PaddlePaddle/PaddleNLP/pull/8261
* [lora]cherry-pick add scaling by lugimzzz in https://github.com/PaddlePaddle/PaddleNLP/pull/8264
* Upgrade paddlenlp to 2.8.0 by w5688414 in https://github.com/PaddlePaddle/PaddleNLP/pull/8266
* [BugFix] Try except sequence parallel utils (8189) by DesmonDay in https://github.com/PaddlePaddle/PaddleNLP/pull/8274
* Support Llama3 by ZHUI in https://github.com/PaddlePaddle/PaddleNLP/pull/8315
* bug fixer (8314) by FeixLiu in https://github.com/PaddlePaddle/PaddleNLP/pull/8318
New Contributors
* DanGuge made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7644
* greycooker made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7768
* ZeyuTeng96 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7832
* shenghwa made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7836
* ziangqin-baidu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7823
* MarioLulab made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7882
* ErnestinaQiu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7660
* Difers made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7950
* SebastjanPrachovskij made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/7936
* LOVE-YOURSELF-1 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8020
* PeiyuLau made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8099
* deepllz made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8127
* liym27 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/8160
**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.7.2...v2.8.0