Paddlenlp

Latest version: v2.8.1

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 9

2.4.3

New Features

Prompt API
* Template String 新增支持关键字 `prefix`和`options`，新增 `position`, `token_type`, `length`, `encoder`, `hidden_size` 等7个属性 3724
* 新增支持 PrefixTemplate
* 解除 `InputExample` 和 `InputFeatures` 对输入数据关键字的限制

问答
* 新增无监督问答pipelines，pipeline运行示例和说明文档 3605
* 新增节点QAFilter、AnswerExtractor、QuestionGenerator、AnswerExtractorPreprocessor、QAFilterPostprocessor
* 新增pipeline QAGenerationPipeline
* FastAPI后端代码，承接ElasticSearch ANN检索库、QAGenerationPipeline和SemanticSearchPipeline
* 无监督问答WEB可视化系统，功能如下：问答检索、在线问答对生成、在线更新索引库、文件上传并自动生成和载入问答对、问答对生成可选择过滤、问答检索可选择返回答案数量和最大检索数量

Trainer
* 新增sharding支持，目前支持sharding stage1、stage2。 3352
* 新增bf16训练支持，可支持单卡、多卡训练。完善了pure_fp16训练支持。
* 新增IterableDataset支持，支持传入Iterable的数据集。
* 新增Seq2SeqTrainer，支持seq2seq任务训练。

FasterGeneration
* 解除 Transformer FFN 中间隐层维度是 `d_model` 4 倍的限制，新增导入 `model_state` 方式加载模型 3592

FastTokenizer
* AutoTokenizer新增`use_fast`参数，指定使用`fast_tokenizer`完成高性能分词。目前`ERNIE`, `BERT`, `TinyBert`以及`ERNIE-M`可开启该选项。3746
* 发布高性能分词工具FastTokenizer 1.0.0 正式版，包含C++预编译包以及Python包 3762

基础底座
* UNIMO 新增支持获取中间输出选项和支持输入 label 并自动计算 loss 3450
* CodeGen 新增支持获取中间输出选项和支持输入 label 并自动计算 loss 3465
* UnifiedTransformer 新增支持获取中间输出选项和支持输入 label 并自动计算 loss 3459
* BART 新增支持获取中间输出选项和支持输入 label 并自动计算 loss 3436
* MBART 新增支持获取中间输出选项和支持输入 label 并自动计算 loss 3436
* T5 支持直接输入 encoder & decoder embedding 结果 3668
* 新增paddlenlp cli工具 3538
* 添加 7 个 P1 级别模型的单测 3462

UIE

* 新增 UIE 量化训练和部署 3496

Neural Search
* 新增Gradicent Cache和Recompute支持单卡超大batch size的训练。 3697

Text Classification
* 新增语义索引的多标签文本分类。3656
* 新增单词和句子级别的可解释性分析 3385
* 修复文本分类部署相关问题 3765
* 基于 Trainer API 更新多分类实现 3679

PPDiffusers
* 将diffusers_paddle重命名为ppdiffusers。3601
* 修复bug支持中文Stable Diffusion, 发布ppdiffusers0.6.1。 3663
* 发布ppdiffusers0.6.2 3737
* 增加laion400m文生图训练脚本。3693 3772
* 支持 EulerAncestralDiscreteScheduler 和 DPMSolverMultistepScheduler 3708 3764
* 增加fid计算代码。3685
* 增加ldm超分的pipeline。 3710
* 增加ppdiffusers推理pipeline使用代码。 3759
* 添加 ppdiffusers CD workflow 3604

Bug Fix
* 修复 FasterEncoder 预测结果异常问题 3606
* 修复 FasterGeneration PrefixLM 类模型在 beam search 解码策略下显存分配问题 3662
* 修复Windows平台下载社区模型失败的问题 3670 3640
* Pipelines修复文件重复上传的问题。3568
* Pipelines修复word文档解析异常的问题。3645
* PIpelines修复批量预测异常的问题。3712
* 修复问题生成模版相关的bug .3646
* TIPC中gpt动转静。3586
* 添加CLIPText，CLIPVision进入auto/modeling，支持AutoModel加载，修改CLIP的默认NEG INF为-1e4，这样fp16 O2不会异常。 3789
* 修复 pypi 自动化发包流程配置 3626

2.4.2

New Features

Text summarization应用

- 增Pegasus中文文本摘要应用，支持Taskflow一键调用，支持FasterGeneration高性能推理，训练推理部署全流程打通。3275

Question generation

- 新增问题生成解决方案，提供基于UNIMO-Text和T5的通用问题生成预训练模型，支持Taskflow一键调用，支持FasterGeneration高性能推理，训练推理部署全流程打通。 3410 3438 3560

Machine Translation
- FasterMBart 支持动转静导出 3367 3356
- MBart tokenizers 升级重构，支持最新 tokenizer 所有功能 3323
- 分离 `MBartTokenizer` 和 `MBart50Tokenizer`，`MBart50Tokenizer` 支持 `AutoTokenizer`，`MBartTokenizer` 和 `MBart50Tokenizer` 支持自定义 sentence piece 参数 3323

Pipelines

- 新增DocPrompt 样例 3542 3534
- 新增ERNIE Vilg文图生成。 3512

Taskflow

- 优化Taskflow定制模型使用体验，增加模型参数文件的更新检查机制。 3506

Bug Fix

- 修复 MBart 限制模型本身翻译语言的问题 3356
- 修复 CodeGen 生成时未使用 token type ids 的问题 3348
- 修复 CodeGen 自适应生成 attention mask 错误 3348
- 修复 T5 在 `use_cache=False` 情况下解码出错问题 3115
- 修复文本摘要taskflow不能加载自定义模型的bug 3533
- 修复问题生成预测时的bug 3524
- 修改uie训练代码中utils.py文件中result变量未定义的问题 3490
- FAQ Finance修复Paddle Serving 在windows上的bug。3491
- 修复Pipelines解析docx文档，文本和图片出现在同一个paragraph的情况。 3546
- 修复语义索引的文本分类的数据说明。3551

Others
- 新增 T5 对 gated-silu 支持 3115
- 升级 T5Tokenizer 以支持 PaddleNLP 最新功能 3115
- 新增 T5 对 4D attention mask 支持 3115
- 新增 T5 支持以字典形式返回 3370
- FasterGeneration 支持 PaddlePaddle 2.4.0-rc0 及以上版本编译 3545
- UnifiedTransformer 支持自适应生成 `position_ids`，`token_type_ids`，`attention mask` 等功能 3177
- UNIMO-Text 支持自适应生成 `position_ids`，`token_type_ids`，`attention mask` 等功能 3349

2.4.1

New Features

ERNIE-Layout 文档智能大模型

- 新增多语言跨模态文档预训练模型ERNIE-Layout，新增Benchmark及基于ERNIE-Layout的各类下游任务的微调及部署示例。3183
- 新增DocPrompt文档抽取问答模型，支持Taskflow一键调用。3183

Pipelines 更新

- 新增Docker cuda11.2镜像，并提供Docker编译教程。3315
- 新增Pipelines批量处理数据。 3432
- 新增一些用户反馈的FAQ和README文档的优化。 3237
- 新增Milvus 2.1的支持。3283

Question Generation
- 新增问题生成example，覆盖中文场景和英文场景。3410
- 新增问题生成taskflow。3438

Compression API
- 压缩 API 支持 ERNIE、ERNIE-M、BERT、TinyBERT、ELECTRA 等 NLU 模型。3234 3324
- DynBERT 宽度自适应裁剪策略支持分布式训练。3361

Prompt API

- 新增 Prompt API 使用文档。3362

Bug Fix

- 修复了小样本文本分类中的失效链接以及在 windows 平台上推理时的数据类型问题。3339 3426
- FAQ Finance 的Milvus升级为2.1版本，文档优化。3267 3430
- 基于检索的文本分类代码简化和README优化。 3322
- Neural Search的文档优化。3350
- 修复了UIE的Dataloader在加载数据时可能导致内存溢出的问题。3381
- 修复DuEE序列标注代码导包错误。https://github.com/PaddlePaddle/PaddleNLP/pull/2853
- 修复Pillow warning问题。 https://github.com/PaddlePaddle/PaddleNLP/pull/3404 和 https://github.com/PaddlePaddle/PaddleNLP/pull/3457
- 更新artist模型的激活函数，修复dallebart中的warning，https://github.com/PaddlePaddle/PaddleNLP/pull/3106
- 修复Ernie tokenizer当中模型名称类型缺失的问题 https://github.com/PaddlePaddle/PaddleNLP/pull/3423
- 修复Bert单测中CI没检测到的Bug https://github.com/PaddlePaddle/PaddleNLP/pull/3422
- 修复动转静过程中对OrderedDict数据类型不支持的问题 https://github.com/PaddlePaddle/PaddleNLP/pull/3364
- 修复 bigru_crf 推理随机hang的问题。 https://github.com/PaddlePaddle/PaddleNLP/pull/3418

Others
- 添加Stable Diffusion的Licence https://github.com/PaddlePaddle/PaddleNLP/pull/3210
- 更新文档中微信群二维码。https://github.com/PaddlePaddle/PaddleNLP/pull/3284
- Processor和FeatureExtractor支持from_pretrained和save_pretrained https://github.com/PaddlePaddle/PaddleNLP/pull/3453
- 添加T5EncoderModel的单测 https://github.com/PaddlePaddle/PaddleNLP/pull/3376
- 添加9个模型的多输入输出和单测代码 https://github.com/PaddlePaddle/PaddleNLP/pull/3305

2.4

2917 2968 2988 3040 3072 3118 3198

超多潮流文图生成
- 支持 DALL-E-mini 、CLIP + Disco Diffusion 、CLIP + Stable Diffusion、ERNIE-ViL +Disco Diffusion等模型

简单易用
- 支持Taskflow一键调用图文生成模型
- 支持FasterGeneration打造高性能推理，打破图文生成性能瓶颈

[文本摘要](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_summarization)
文本摘要是目前NLP场景中高频场景，此次发版新增中文文本应用，支持文本摘要定制化训练 2971
- 新增文本摘要Application，支持定制化训练，打通高性能推理部署，支持Taskflow一键调用

框架升级
[模型自动压缩Compression API](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/compression.md)
- 新增模型压缩 API，支持基于 PaddleSlim 的裁剪和静态离线量化功能，快速加速文本分类、语义匹配、序列标注、阅读理解任务 2777
- 模型压缩API可以快速调用模型裁减、模型量化功能，大幅降低模型压缩使用成本

[小样本学习 Prompt API](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification/multi_class/few-shot)
- 新增Prompt Learning训练框架，支持PET、P-Tuning、RGL等经典模型的快速实现 2894
- 文本分类场景中使用Prompt Learning训练框架快速提升小样本训练效果 2894

[Transformers 预训练模型](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/index.html)
基础 API
- BERT、ERNIE、RoBERT 等模型接口新增获取 attention score 和所有中间层输出功能，可以轻松使用满足蒸馏等需求 2665
- BERT、ERNIE、RoBERT 等模型接口新增对 past_key_values 输入支持，通过该输入可以进行 prefix-tuning 2801
- BERT、ERNIE、RoBERT 等模型接口新增输入 label 返回 loss 支持，简化使用方式，无需再拆分label和额外定义损失函数 3013
- BERT、ERNIE、RoBERT 等模型接口支持输出支持以 dict 形式返回，可以用更清晰的方式从返回内容中获取需要的输出内容 2665
- 系统批量完善预训练模型接口单测，保障功能稳定性

模型权重
- 新增XLM模型 https://github.com/PaddlePaddle/PaddleNLP/pull/2080
- 转换Langboat/mengzi-t5-base-mt权重，并新增Zero Shot使用样例 https://github.com/PaddlePaddle/PaddleNLP/pull/3116
- 新增Roformer-sim，支持复述生成，可以生成相似句做数据增强 https://github.com/PaddlePaddle/PaddleNLP/pull/3049

Bug Fix
- 批量新增模型`model_max_input_size`配置字段 3127
- 修复 FasterGeneration 部分模型Sampling解码出core的问题。2561
- 修复 UNIMOText 在不使用加速特性情况下生成出错问题 2877
- 修复 FasterGeneration 在基于采样解码策略下性能不稳定的问题 https://github.com/PaddlePaddle/PaddleNLP/pull/2910
- 修复 BART tokenizer 获取 `bos_token_id` 出错问题 https://github.com/PaddlePaddle/PaddleNLP/pull/3058
- 修复 BART tokenizer 无法设置 `model_max_length` 问题 https://github.com/PaddlePaddle/PaddleNLP/pull/3018
- 修复 Taskflow的文本相似度在Windows上dtype引起的预测失败问题 https://github.com/PaddlePaddle/PaddleNLP/pull/3188

Others
- 支持FasterGPT的word_embeddings 和 lm_head.decoder_weight的权重不共享 2953
- 重构RoFormer，新增RoFormerForCausalLM类，支持roformer-sim相似句生成 3049
- 更新ERNIE模型，当type_vocab_size=0时，表示不使用token_type_id 3075
- 新增ERNIE-Tiny模型的benchmark 3100
- 更新BERT预训练时混合精度的配置，AMP level改为O2 3080
- FasterBART支持动转静和高性能推理。https://github.com/PaddlePaddle/PaddleNLP/pull/2519
- FasterGeneration 预测库联编支持 ONNX 依赖引入 3158
- Generation API 支持 `logits_processor`、`get_decoder_start_token_id()` 3018
- BART 模型支持 `get_input_embeddings()` 和 `set_input_embeddings()` 方法获取 embeddings 3133
- GPT 模型支持 `get_vocab()`、 0/1 attention mask、add bos token 等新增接口功能 2463

New Contributors
New Contributors
* Spico197 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2170
* sandyhouse made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2190
* qingqing01 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2188
* RicardoL1u made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2299
* Intsigstephon made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2285
* sljlp made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2398
* zche4846 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/1845
* tianberg made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2461
* lidanqing-intel made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2468
* fightfat made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2499
* LiYuRio made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2504
* FeixLiu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2523
* ArtificialZeng made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2537
* freeliuzc made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2543
* taixiurong made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2556
* westfish made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2423
* sneaxiy made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2660
* lastrei made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2671
* WenmuZhou made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2695
* littletomatodonkey made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2732
* piotrekobi made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2730
* Liujie0926 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2829
* buchongyu2 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2817
* GuoxiaWang made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2846
* zhiyongLiu1114 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2875
* veyron95 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2879
* BasicCoder made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2977
* dongfangshenzhu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3046
* Haibarayu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2694

Full Changelog: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.0...v2.4.0

2.4.0

New Features

[NLP Pipelines流水线工具](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/pipelines)
PaddleNLP Piplines旨在提升NLP模型上线效率，将NLP复杂系统的通用模块抽象封装为标准组件，支持快速组合复杂NLP系统应用

3003 3160 3135 3092 3186

插拔式组件设计
- 支持文档存储灵活节点配置，支持Faiss、Milvus高性能向量搜索引擎
- 支持文档级别前处理节点配置，支持PDF、图片级别文档信息提取

飞桨SOTA模型快速串联
- 支持飞桨中文SOTA预训练模型，ERNIE 3.0 系列轻量化快速集成到Pipelines中
- 支持 RocketQA 语义索引模型，快速提升语义索引、FAQ系统效果

低门槛一键部署
- RocketQA DuReader语义提取模型一键调用，通用场景无需进行语义模型训练
- Docker和Docker-compose两种方式一键部署，减少环境安装成本

产业范例库升级

[文本分类](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/text_classification)
文本分类全流程应用，支持预训练模型、小样本、语义索引方案，通过TrustAI来快速调优模型
3087 3184 3104 3180 2956 3011
文本分类方案全覆盖
- 支持多分类、多标签、层次分类算法
- 支持预训练微调、小样本Prompt tuning微调方式、以及语义索引分类方案
- 底座模型支持ERNIE 3.0 全系列模型，适配不同的使用场景

高效模型调优
- TrustAI模型可解释性工具，快速定位稀疏数据、脏数据问题，进一步提升模型效果
- 接入数据增强工具，多种数据增强方法，可快速对稀疏数据进行增强

产业级全流程方案
- 支持数据标注、模型训练、模型压缩、模型预测部署全流程

[信息抽取](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie)
- 新增多语言模型UIE-M-Base和UIE-M-Large，支持中英文混合抽取及Taskflow一键调用。3192
- 新增基于封闭域模型GlobalPointer的UIE数据蒸馏方案，支持Taskflow一键部署。3136

[语义索引](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/neural_search)
- 新增RocketQA的CrossEncoder模型，并支持加载到pipelines中。3196
- Neural Search的召回模型换成基于ERNIE3.0的RocketQA模型，并支持加载到Pipelines中。 3172

AIGC内容生成

[CodeGen代码生成](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/code_generation/codegen)
PaddleNLP 2.4版本发布CodeGen代码生成全系列SOTA模型，可快速一键调用代码生成模型
2641 2754 3017

效果领先
- 集成代码生成SOTA模型CodeGen
- 集成12个CodeGen不同规模的代码生成模型，支持多编程语言代码生成模型

简单易用
- 支持通过Github Copilot调用模型，同时支持Taskflow一键调用模型
- 支持FasterGeneration打造高性能推理，毫秒级响应

[文图生成](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/model_zoo/taskflow.md#%E6%96%87%E5%9B%BE%E7%94%9F%E6%88%90)

2.3.7

New Features

+ 新增基于ERNIE 3.0的RocketQA召回模型，包含rocketqa-zh-base（12-layer, 768-hidden）、rocketqa-zh-medium（6-layer, 768-hidden）、rocketqa-zh-mini（6-layer, 384-hidden），rocketqa-zh-micro（4-layer, 384-hidden）和rocketqa-zh-nano（4-layer, 312-hidden）5个语义检索召回模型，在Dureader Retrieval数据集上达到中文最佳效果。 3033
+ 新增基于ERNIE 3.0的RocketQA排序模型。包含rocketqa-base（12-layer, 768-hidden）、 rocketqa-medium（6-layer, 768-hidden）、rocketqa-mini（6-layer, 384-hidden）、rocketqa-micro（4-layer, 384-hidden）和rocketqa-nano（4-layer, 312-hidden）5个语义检索排序模型，在Dureader Retrieval数据集上达到中文最佳效果。 3019
+ 新增VI-LayoutXLM文档多模态模型，推理速度与精度超越LayoutXLM。2935
+ NLP流水线系统Pipelines新增RocketQA轻量化模型，端到端响应速度显著提升。 3078

Unit Test

+ 新增Ernie-Gram模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/3059
+ 新增TinyBert模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/2992
+ 新增Roformer模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/2991
+ 新增ERNIE-M模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/2964
+ 新增Skep模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/2941
+ 新增Electra和XLNet模型单测 https://github.com/PaddlePaddle/PaddleNLP/pull/3031
+ 新增RoBERTa、ALBERT 和 ERNIE模型的单测 https://github.com/PaddlePaddle/PaddleNLP/pull/2972

Bug Fix
+ 修复BART tokenizer获取 `bos_token_id` 出错问题 3058
+ 修复BART tokenizer无法设置 `model_max_length` 问题 3018
+ 修复Pipelines的随机问题生成按钮报错问题和搜索问题回退到上一个搜索结果的问题。 2954
+ 修复Pipelines在Python3.7上利用FAISS抽向量引起的问题。 2965
+ 修复Tokenizer `resize-token-embeddings` 错误 2763
+ 修复OPT示例代码 3064
+ pointer_summarizer支持xpu和多卡 2963 3004

New Contributors
* veyron95 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2879
* BasicCoder made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2977
* dongfangshenzhu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/3046
* Haibarayu made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2694

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.5...v2.3.7

Page 5 of 9

Releases

Has known vulnerabilities

Previous Next

Paddlenlp

Page 5 of 9

2.4.3

2.4.2

2.4.1

2.4

2.4.0

2.3.7

Page 5 of 9

Links

Releases