Paddlenlp

Latest version: v2.8.1

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 9

2.6

我们的工具链全面支持LLaMA 1/2, BLOOM, ChatGLM 1/2, GLM, OPT等主流大模型。这使得用户可以在使用同一套工具的前提下，以低成本的方式尝试各种不同的大模型。

为了支持这套大模型工具链，我们进行了大量的底层和基础框架侧的升级：

- 我们将Trainer API升级成为了4D并行分布式Trainer，这让模型的训练过程变得更加高效。
- 我们实现了高效微调算法LoRA/Prefix Tuning，使得单机可以精调千亿级别的模型。
- 同时，我们还依托PaddleSlim的自研量化算法，在所有支持的大模型上全面实现了无损量化。

2.6.0 2.6.0rc

核心改动

- 全面支持主流开源大模型[bloom](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/bloom), [chatglm](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/chatglm), [glm](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/glm), [llama](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/llama), [opt](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/language_model/opt)的训练和推理范例
- 新增跨模态模型[minigpt4](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/minigpt4), [speecht5](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/transformers/speecht5)
- Trainer新增模型并行能力，对于支持模型并行的模型可一键开启tensor parallel训练
- 新增低参数高效微调能力[peft](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/paddlenlp/peft), 包括单卡和分布式的LoRA和Prefix Tuning策略，助力大模型应用落地
- Pipelines新增实验性质的大模型应用和Agents功能，包括[文档单多轮问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/chatbot)和[基于ReACT的agents](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines/examples/agents)

2.5.2

New Features

PPDiffusers

- 新增基于FastDeploy的CycleDiffusionPipeline和动态图版CycleDiffusionPipeline、增加动态图版的Gradio调用界面 4945 4830
- 更新LoRA，支持自定义lora_rank 4894 4925
- 新增ControlNet、支持推理与训练 5009 5090
- 升级community目录下clip_guided_stable_diffusion, interpolate_stable_diffusion, lpw_stable_diffusion, stable_diffusion_mega 4920 4947

AutoNLP
- autonlp文本分类支持使用taskflow进行推理部署 4896
- 支持文本分类模型finetune和prompt tune训练--评估-压缩-推理全流程4967 4963
- 支持visualdl和训练日志分发到每个trial 4990 5021

基础底座
- 完成MegatronBERT, MobileBert, Reformer, Roformerv2, skep的transformers模型升级
- 新增14个BART中文模型 4636
- 新增3个文本摘要Taskflow中文模型 4933

FastGeneration
- 新增CodeGen-16B的示例 4895
- BART FastGeneration新增FusedAttention优化 5111

Bug Fix
- 修复BART FastGeneration推理结果不正确的问题 5111
- 修复UIE-M系列模型zero-shot抽取问题 5108
- 修复DocParser图片读取及PDF缩放问题 4975
- 修复CLIP和ChineseCLIP中的project dim，确保text_config与vision_config与之前一致 5074
- 修复Trainer在Sharding Stage3时，GroupNorm与框架PyLayer API的Bug 4930

2.5.1

New Features

PPDiffusers

- PPDiffusers支持从HF Hub加载和上传模型 4640 4625
- 新增 AutoEncoder 的训练流程 4137
- 新增 LoRa ，支持使用lora训练 dreambooth、text_to_image，同步更新上述训练脚本 4768

AutoNLP

- AutoNLP分类支持英语模型 4704
- AutoNLP分类中文模型统一升级为Ernie 3.0 Tiny v2系列 4827
- AutoNLP优化log控制能力 4844

基础底座

- ERNIE-Layout支持re-compute 4490
- Roberta, T5结构新增AutoConverter功能，可以直接加载torch模型
- 将PaddleNLP内所有激活函数统一至`paddlenlp.transformers.activations` 4589
- Nezha, GauAlpha 模型结构完成transformers统一体验升级
- 给 Chineseclip 模型支持AutoModel 的功能 4585
- 添加 model-zoo 测试样板间 4398
- 新增BLIP 1.0模型，支持CLIP Interrogator图生文 4676
- 删除CLIP、ErnieVil、ChineseCLIP中重写的 from_pretrained_v2 方法 4797
- 新增polynomial学习率变化策略，DataCollatorForLanguageModeling，DataCollatorForWholeWordMask API 4826

UTC

- 新增 utc-xbase, utc-base, utc-medium, utc-mini, utc-micro, utc-nano, utc-pico 版本，默认模型由 utc-large 切换为 utc-base 4716 4825
- 新增 UTC 英文文档 4476

Pipelines

+ 新增跨模态检索端到端的方案，支持以文搜图的整套服务化部署方案。4516

Bug Fix

- 修复UIE-X特殊字符预测结果偏移问题 4687
- 修复Taskflow中zero_shot_text_classification任务本地模型加载失败的问题 4505
- 修复UTC 模型batch内对cls_positions gather结果不符合预期的问题 4785
- 修复bos模型下载notebook内的tqdm体验问题 4603
- 删除多余的protobuf依赖 4600
- 修复ernie-m自动生成attention_mask的错误 4494
- 修复pre-release版本下载安装 4661
- 修改 AutoConverter 中精度对比随机性的问题 4568
- 修复非community的model权重，在多机或者多卡情况下下载的错误问题 4491
- 修复information_extraction, unified_sentiment_analysis, model_zoo/uie中参数is_shuffle的传参类型问题 4460
- 修复 T5 FastGeneration sampling 结果出错的问题 4624

2.5

New Features
[PPDiffusers 扩散模型工具库发布](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/ppdiffusers)

大火的AI绘画扩散模型来了 🔥

<picture>
<img src="https://user-images.githubusercontent.com/14105589/212271202-f8d81442-4865-43a9-a491-cc104129f0be.png", heigh=200, width=400>
</picture>

PPDiffusers是基于PaddlePaddle的扩散模型工具箱，提供多模态的扩散模型，希望助力开发者快速使用和开发文生图、文生视频、文生文相关扩散模型

SOTA扩散模型Pipelines集合
- 通过pipelines几行代码即可使用 Stable Diffusion 绘画，还能够基于FastDeploy高性能加速；这样出色的模型应用pipelines还有[30+](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/ppdiffusers/ppdiffusers/pipelines/README.md)，包括最新的中文文生图模型 IDEA/Taiyi-Stable-Diffusion、BAAI/AltDiffusion、MindDiffusion/wukonghuahua。

丰富的Noise Scheduler和模型组件
- 提供丰富的噪声调度器（Noise Scheduler），不仅支持主流使用的DDPM、DDIM 和 PNDM，还支持最新的 DPMSolver，14+ Scheduler供您在速度与质量之间权衡。集成多种 Diffusion 模型组件，如UNet1d、UNet2d、UNet2d Conditional，方便的搭建自己的扩散模型。

全方位的训练和推理教程
- 提供了多场景需求的训练教程，[从头训练](https://github.com/PaddlePaddle/PaddleNLP/tree/v2.5.0/ppdiffusers/examples/text_to_image_laion400m)、[领域微调](https://github.com/PaddlePaddle/PaddleNLP/tree/v2.5.0/ppdiffusers/examples/text_to_image)及[小样本定制化](https://github.com/PaddlePaddle/PaddleNLP/tree/v2.5.0/ppdiffusers/examples/dreambooth)都可以满足。训练后您自己的模型也可以参照[FastDeploy推理教程](https://github.com/PaddlePaddle/PaddleNLP/tree/v2.5.0/ppdiffusers/deploy)进行高性能加速。

[端上语义理解压缩方案](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-tiny)
发布基于ERNIE 3.0 Tiny模型的端上语义理解压缩方案，帮助开发者快速在边缘端设备部署预训练模型

ERNIE 3.0 Tiny V2 轻量级模型发布
- ERNIE 3.0 Tiny V2在V1的模型的基础上使用了下游知识注入、多任务学习等策略，在out-domain、low-resourced 数据上的效果显著提升

基于 PaddleSlim 全量化压缩方案发布
- 首次发布基于PaddleSlim的全量化加速方案，同时支持词表量化来降低部署内存占用，在精度基本无损的情况下模型预测速度大幅提升

FastDeploy 全场景部署
- FastDeploy 是一款全场景、易用灵活、极致高效的 AI 推理部署工具，大大降低在边缘端部署难度

[产业范例库升级](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications)
[文档智能信息抽取UIE-X 应用](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/information_extraction/document)
- **场景全面**：覆盖文档信息抽取各类主流任务，支持多语言，满足开发者多样信息抽取落地需求
- **效果领先**：以在多模态信息抽取上有突出效果的模型UIE-X作为训练基座，具有广泛成熟的实践应用性
- **简单易用**：通过Taskflow实现三行代码可实现无标注数据的情况下进行快速调用，一行命令即可开启信息抽取训练，轻松完成
部署上线，降低信息抽取技术落地门槛
- **高效调优**：开发者无需机器学习背景知识，即可轻松上手数据标注及模型训练流程

[统一文本分类UTC应用](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/zero_shot_text_classification)
- **SOTA效果**：UTC是基于统一语义匹配框架建模的SOTA模型，模型效果刷新FewCLUE和ZeroCLUE两大榜单
- **统一建模**：单模型可支持多种任务建模，同时支持多分类、多标签、层次分类多个任务
- **快速迁移**：零样本分类和小样本迁移能力强，同时提供Label Studio标注工具标注方法，支持快速调优开发

[统一情感分析UIE-Senta应用](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/sentiment_analysis)
- **应用全面**：新增uie-senta系列模型，模型效果大幅提升，支持语句情感分类，属性抽取，观点抽取等常用情感分析能力
- **高效调优**：提供Label Studio标注工具标注方法，开发者通过简单数据标注，即可快速进行模型训练与调优
- **场景验证**：真实应用场景打磨的应用工具，解决隐性情感维度抽取、情感维度聚合等真实场景难题

[无监督问答应用](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/question_answering/unsupervised_qa)
- **应用创新**：无监督检索式问答系统（即问答对自动生成智能检索式问答），基于问题生成、UIE答案抽取、检索式问答等应用组合来支持以非结构化文本形式为上下文自动生成QA问答对，生成的问答对语料可以通过无监督的方式构建检索式问答系统。
- **简单应用**：通过PaddleNLP Pipelines 提供包括问答语料生成、索引库构建、模型服务部署、WebUI可视化一整套端到端智能问答系统能力

[基础框架升级](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/docs)

[PretrainedConfig](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/paddlenlp/transformers/configuration_utils.py)
- 模型配置正式化，配置模型参数更加易用，GPT/T5/Ernie/ErnieM/ErnieLayout/Bart/MBart/Unified_Transformer/Unimo/CodeGen 等模型升级至使用PretrainedConfig

[Trainer API](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md)
- 新增基础训练能力支持，支持混合精度O1、O2两种模式bf16训练 3352
- 新增分布式技术能力支持，支持recompute重计算、sharding训练支持 3352
- 新增 `Seq2SeqTrainer` 支持 seq2seq 类型模型训练。3352
- 新增 `Memory Tracer` 支持监控内存、显存 4181

[模型压缩 API](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/compression.md)
- 模型压缩 API 接入量化训练、词表压缩等功能，并支持各种策略组合 3271 4159 4011
- 模型压缩 API 支持 ERNIE、UIE、BERT、TinyBERT、ELECTRA、ERNIE-M、RoBERTa、PP-MiniLM 等 3234

[数据增强API](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/dataaug.md)
- 新增字和句子级别数据增强策略，新增基于反义词和基于word embedding的近义词表，支持文件输入-输出数据增强 4194

[Prompt API](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/advanced_guide/prompt.md)
- Template API 新增支持 Prefix-Tuning 和 UniMC
[FastGeneration](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/advanced_guide/fastgeneration/fastgeneration.rst)
- 新增T5生成加速，动转静以及预测库支持 3763
- `model.generate()` 接口调整，`use_faster` 参数调整为 `use_fast` 4213
- Transformer 生成加速解除 FFN 中间隐层大小必须是 4 倍的限制 3592

[FastTokenizer](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/fast_tokenizer)
- 更新[FastTokenizer 1.0.1](https://pypi.org/project/fast-tokenizer-python/#description), 修复PretrainedFastTokenizer中get_vocab_size关键词参数错误 4339
- 修复FastTokenizer AddToken接口无法接受AddedToken数据结构的错误。4380
- 修复FastTokenizer单线程分词仍创建线程的问题。 4441

[SimpleServing](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/server.md)
- 新增SimpleServing服务化部署方式，SimpleServing是基于FastAPI的二次封装的服务化部署方式，支持Transformers模型和Taskflow几行代码快速部署，降低开发者服务化部署难度 2845
[Huggingface 生态联合](https://twitter.com/PaddlePaddle_/status/1608777581033328646)
PaddleNLP首次和Huggingface生态联合，支持所有Model和Tokenizer类支持直接从 Huggingface Hub下载和上传，开发者可以直接从Huggingface体验预训练模型效果

- 所有Model和Tokenizer类支持直接从Huggingface Hub下载和上传
- Text Summarization, Fill Mask, Dialogue Taskflow支持直接从Huggingface Hub加载, 并且连通HuggingFace Inference API
- 新增ConversionMixin, bert和gpt模型的`from_pretrained` 支持直接从Huggingface Hub加载torch权重的模型

Bugs
- 修复 load_torch 中的特殊情况 4383
- 修复基于SKEP的情感分析tokenizer分词问题 4357
- 修复 FastGeneration 在 FP16 下生成不在词表中 id 的问题 3936
- 修复 FastGeneration 在新版 PaddlePaddle eager mode 上使用 FP16 上不可用的问题 3936
- 修复 UnifiedTransformer 和 UNIMOText 在原生生成式 API 使用问题 3936
- 修复 BART，MBART，T5 在 4D AttentionMask 下生成报错的问题 3936
- 修复Windows系统下生态模型下载的问题 3640 3670
- 修复`from_pretrained_v2`不能load fp16模型的问题。3902
- 修复Trainer sharding下保存模型报错的问题。4220
- 修复Windows下用CPU训练Pegasus文本摘要报错的问题。4431
-
Others
- 新增数据下载以及全套数据预处理流程，新增数据集自定义接口以及文档说明 3269
- T5新增prepare_decoder_input_ids_from_labels method 4331
- 重构CLIP和ERNIE VIL模型，新增ChineseCLIP模型 4270
- 新增CMSIM_LOCK模型 4388
- Pipelines支持批量的预测，Pipelines新增ERNIE Vilg文图生成、RocketQAv2、ERNIE-Search英文语义检索 3432 3512 3718 3906 ；PIpelines新增关键字，语义检索两路召回，新增Docker 镜像构建流程，新增Milvus 2.1向量检索工具 3864 3315 3283

New Contributors
* JamesLim-sy made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3089
* bruce0210 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3209
* wuhuachaocoding made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3211
* kztao made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3182
* paopjian made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3221
* 0x45f made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3277
* HexToString made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3309
* Septilliony made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3375
* Elvisambition made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/1799
* YanhuiDua made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3377
* Yam0214 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3370
* alkaideemo made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3424
* ShawnNew made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3431
* qipengh made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3434
* sijunhe made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3411
* iamWHTWD made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3527
* USTCKAY made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3521
* feifei-111 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3585
* Wang-ck123 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3409
* chenxiangzhen made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3602
* ymyjl made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3641
* sserdoubleh made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3662
* ChenBinfighting1 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3677
* firestonelib made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3755
* co63oc made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3955
* zjjlivein made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3969
* DefTruth made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3999
* christineaa made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3977
* shentanyue made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/4042
* LazyFyh made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/4102
* pangyoki made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3954
* GGBond8488 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/4186
* chncaption made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/4040
* SylarTiaNII made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/4228

Page 3 of 9

Releases

Has known vulnerabilities

Previous Next

Paddlenlp

Page 3 of 9

2.6

2.6.0

2.6.0rc

2.5.2

2.5.1

2.5

Page 3 of 9

Links

Releases