Paddlenlp

Latest version: v2.8.1

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Page 6 of 9

2.3.5

New Features

代码生成
- CodeGen支持Taskflow一键调用。 2754
- 增加CodeGen使用文档。2791

UIE

- 新增UIE英文版本，支持Taskflow一键调用。 2855

Neural Search

- 新增排序模型的C++和pipeline的部署。 2721
- 新增in-batch negatives边训练边评估的功能。 2663

小样本学习

- 新增小样本模型RGL的实现。2651

文本分类

- 新增多分类、多标签application。2661 2675
- 数据增强策略 2805

文本匹配
- 新增无监督语义向量模型DiffCSE。 2643

Bug Fix

- 修复pipelines未传入max_seq_len的问题。2736
- 修复pipelines的faiss-cpu依赖，新增乱码处理的FAQ。 2709
- 修复neural search的预测时dropout引起的结果不一致的错误，新增对ANN索引的FAQ。2710
- 修复ERNIE tokenizer的 get_offset_mapping 错误。2857 2897
- 修复 model 中间 output 输出导致的 UNIMOText 原生生成失败问题。 2877

其他

New Contributors
- lastrei Add pet static model export script and inference code [2875 ]()
- zhiyongLiu1114 Add the get_speical_token_mask for the ernie tokenizer [2671 ](https://github.com/PaddlePaddle/PaddleNLP/pull/2671) [#2690 ](https://github.com/PaddlePaddle/PaddleNLP/pull/2690)

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.4...v2.3.5

2.3.4

New Features

Taskflow
- 新增三个UIE小模型：UIE-Mini(6-layer, 384-hidden)、UIE-Micro(4-layer, 384-hidden)、UIE-Nano(4-layer, 312-hidden)。2604
- 新增基于中文词类知识的信息抽取工具WordTag-IE。 2540

更多预训练模型
- 开源 ERNIE Tiny 预训练模型，效果、精度领先于HFL、UER、Huawei-Noah 同等规模下开源中文模型。
- 新增CodeGen代码生成模型。2641

基础体验优化
- Trainer 支持 constant、cosine、linear三种学习率调度策略。 2511
- FasterBART支持动转静和推理。2519
- FasterGeneration 支持使用带有 onnx 的预测库的编译。2463

CLUE Benchmark
- 支持 CLUE 10 个任务的训练、评估、预测，支持用户产出预测结果提交至 CLUE 榜单，并提供 Grid Search 工具供用户一键训练，最终获取最优评估结果。

文本分类
- 新增多标签层次分类。 2501
- ERNIE-DOC模型在分类任务上添加预测部署流程。1845

生态模型
- 新增XLM模型。2080

Bug Fix

- 修复UIE同类别嵌套的评估问题。 2558
- 修复UIE prompt为英文时，prompt与文本的offset重叠的问题。2453
- 修复BERT Tokenizer调用get_offset_mapping出错的问题。 2508
- 修复FasterGeneration部分模型Sampling解码出core的问题。2561
- 修复PretrainedTokenizer和PretrainedModel 中from_pretrained中的潜在问题。 2521 2578 2424
- 修复LukeTokenizer当中的字段缺失导致保存时报错的问题。 2631
- 修复ChineseBertTokenizer由于Tokenizer机制更新导致expect parameter的问题。 2625
- 修复 PretrainedTokenizer special token 设置被覆盖及遗漏的问题 2534 2629
- 修复 albert pad token id 缺失问题 2495
- 修复 ERNIE-1.0 预训练使用amp 02时，加载checkpoint错误问题 2479
- 移除RandomGenerator的is_init_py属性 2658

其他
- BERT 支持 fused_ffn、fused_attention进行fuse 2523

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.3...v2.3.4

2.3.3

Bug Fix

- 修复 `AutoModel` 模型选择 bug 导致从本地目录加载 `ernie-1.0` 等模型失败的问题 2426
- 修复 tokenizer 从本地目录加载时由于文件检查 bug 导致失败的问题 2424
- 修复 Taskflow 依存分析输出的类型问题 2422
- 修复 UIE 中 doccano 标注数据转换脚本的 split 检查问题；并完善 `Task` 使用 ONNX 预测的报错方式 2417
- 修复代码中的 data 拼写问题 2410
- 修复 PaddleNLP/README 中的 UIE 链接 2419

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.2...v2.3.3

2.3.2

New Features

更快的推理部署
- UIE 推理加速：支持 UIE 模型 CPU、GPU 设备上高性能推理能力，显著提升 UIE 推理速度。
- ERNIE 3.0 模型支持 Triton Inference Server服务化部署。

更多预训练模型
- 新增 4 个 [**文心 ERNIE 3.0 系列中文模型**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0) ：包含 3 个小模型 [ERNIE 3.0-Mini](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_mini_zh.pdparams) (6-layer, 384-hidden)、[ERNIE 3.0-Micro](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_micro_zh.pdparams) (4-layer, 384-hidden)、[ERNIE 3.0-Nano](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_nano_zh.pdparams) (4-layer, 312-hidden)，1 个20层模型 [ERNIE 3.0-XBase](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_3.0/ernie_3.0_xbase_zh.pdparams)(20-layer, 1024-hidden)。
- 开源 ERNIE 2.0 中文模型：包括 [ERNIE 2.0-Base](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_2.0/ernie_2.0_base_zh.pdparams)(12-layer, 768-hidden)、[ERNIE 2.0-Large](https://bj.bcebos.com/paddlenlp/models/transformers/ernie_2.0/ernie_2.0_large_zh.pdparams)(24-layer, 1024-hidden)。

基础体验优化
- ERNIE-M 模型支持多项选择式阅读理解任务。
- 新增支持 XLNet 模型动转静能力。
- BART `Tokenizer` 兼容性优化。

生态模型
- 新增 GAU-alpha 生态模型。

Bug Fix
- 修复 `ElectraTokenizer` 缺失 `do_lower_case` 属性问题。2263
- 修复 CLUE Benchmark 评估 CHID 任务日志 Bug。2298
- 修复语义检索 Application、FAQ System 在 Windows 系统数据类型报错问题。2381
- 修复基于 `AutoTokenizer` 加载 ERNIE 模型报错问题。2315
- 修复 `load_dataset` 函数报 `dict_keys` 错误问题。2364
- 修复文本生成 example Windows 平台数据类型报错问题。2351
- 修复 ERNIE 3.0 ONNX Runtime 推理 Bug。2386
- 修复 DDParser 针对 1-D Array 的 Padding 问题。2333

New Contributors
* RicardoL1u made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2299
* Intsigstephon made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2285
* sljlp made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2398

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.1...v2.3.2

2.3.1

Improvements

* GPT-3支持静态图混合并行情况下的生成推理。 https://github.com/PaddlePaddle/PaddleNLP/pull/2188 https://github.com/PaddlePaddle/PaddleNLP/pull/2245

BugFix
* 新增基于 FAISS ANN 引擎一键运行语义检索系统示例。https://github.com/PaddlePaddle/PaddleNLP/pull/2180
* 修复 PaddleNLP 智能文本产线示例 CPU 运行报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2201
* 修复 GPT 编译报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2191
* 修复 GPT 预训练数据流未传入 `max_seq_len` 参数问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2192
* 修复 GPT-3 静态图混合并行，预训练报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2190 https://github.com/PaddlePaddle/PaddleNLP/pull/2223 https://github.com/PaddlePaddle/PaddleNLP/pull/2195
* 修复 tokenizer 非兼容升级导致 NPTag 解码错误问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2199
* 修复Taskflow UIE Schema 重复构建的问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2170
* 兼容 NER 标注任务 doccano 多种导出格式的数据转换。https://github.com/PaddlePaddle/PaddleNLP/pull/2187
* 修复 NPTag 解码问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2233
* 修复 DuUIE `max_seq_len` 报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2207
* 修复 Windows 系统默认编码非 UTF8 时的编码报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2209
* 修复 AlbertForQuestionAnswering import 报错问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2216
* 修复 CLUE Benchmark 预测结果格式问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2215
* 修复死链问题。https://github.com/PaddlePaddle/PaddleNLP/pull/2231 https://github.com/PaddlePaddle/PaddleNLP/pull/2230 https://github.com/PaddlePaddle/PaddleNLP/pull/2235 https://github.com/PaddlePaddle/PaddleNLP/pull/2240 https://github.com/PaddlePaddle/PaddleNLP/pull/2241

New Contributors
* Spico197 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2170
* sandyhouse made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2190
* lugimzzz made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2196
* qingqing01 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2188

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.3.0...v2.3.1

2.3.0

New Features

通用信息抽取技术 UIE

- 新增基于统一结构生成的通用开放域信息抽取框架 [**UIE (Universal Information Extraction)**](https://arxiv.org/abs/2203.12277)，单个模型可以支持命名实体识别、关系抽取、事件抽取、情感分析等任务，同时在模型规模上支持base和tiny两种结构，满足多种业务场景需求，均支持Taskflow一键预测。
- 新增医疗领域信息抽取模型 **UIE-Medical**，支持医疗专名识别和医疗关系抽取两大任务，并支持小样本学习，预测精度业界领先。

文心NLP大模型升级

- 新增文心大模型[**ERNIE 3.0**轻量级版本](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-3.0)，包含**ERNIE 3.0-Base**（12层）和 **ERNIE 3.0-Medium**（6层）两个中文模型，在CLUE Benchmark上实现同规模模型中文最佳效果。
- 新增中文医疗领域预训练模型 [**ERNIE-Health**](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/ernie-health)，支持医学文本信息抽取（实体识别、关系抽取）、医学术语归一化、医学文本分类、医学句子关系判定和医学问答共5大类任务，并提供 [CBLUE benchmark](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/exmaples/benchmark/cblue) 使用实例。
- 新增[**PLATO-XL**](https://github.com/PaddlePaddle/Knover/tree/develop/projects/PLATO-XL)（11B），全球首个百亿参数对话预训练生成模型，提供FasterGeneration高性能GPU加速，相比上版本推理速度加速2.7倍，更多使用说明请查阅[PLATO-XL with FasterGeneration](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/plato-xl)

FasterGeneration 高性能生成加速

FasterGeneration本次发版进行了以下的升级，更多使用说明请查阅[FasterGeneration文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/faster_generation)
速度更快
- 更细致的融合加速：UnifiedTransformer、UNIMOText 模型Context计算加入加速支持，速度相比上个版本提升20%～110%
- 更丰富的模型支持：扩展了 `size_per_head` 支持范围，支持了 CPM-Large（2.6B）和PLATO-XL（11B）等大模型生成加速
- 更快的大模型推理：支持Tensor并行和Pipeline并行推理，CPM-Large 上 4卡 Tensor 并行速度较单卡高性能生成提升40%，PLATO-XL在4卡加速比为单卡的2倍

显存更少
- 优化模型加载转换显存占用，支持直接使用 FP16 模型并允许去除原始未融合的QKV权重参数

部署更易
- 新增参数支持直接使用 Encoder 加速能力，打通 Encoder 加速与 Decoding 加速
- 支持UnifiedTransformer、UNIMOText 等更多加速版本模型导出静态图并在Paddle Inference实现高性能部署

更多产业范例与应用场景
- 新增[汽车说明书智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/doc_vqa)应用范例，基于百度领先的开放域问答技术[RocketQA](https://arxiv.org/abs/2010.08191)和多模态多语言预训练模型[LayoutXLM](https://arxiv.org/abs/2104.08836)提供了多模态文档问答的应用范例和最佳实践。
- 新增[智能语音指令解析](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/speech_cmd_analysis)应用范例，可广泛应用于智能语音填单、智能语音交互、智能语音检索、手机APP语音唤醒等场景，提高人机交互效率。
- 新增端到端[智能问答](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/question-answering)系统应用范例，提供低成本快速搭建可视化智能问答系统能力。
- 新增端到端[语义检索](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/applications/experimental/pipelines/examples/semantic-search)系统应用范例，提供低成本快速搭建语义检索系统能力。
- 新增 [NLP 模型可解释性](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/model_interpretation)应用示例 #1752 ，感谢 [binlinquge](https://github.com/binlinquge) 的贡献
- 新增 [CLUE Benchmark 评测脚本](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/benchmark/clue)，更全面的了解PaddleNLP中文预训练模型的效果，帮助开发者便捷完成中文模型选型
- BERT 静态图训练增加 Graphcore IPU 支持 1793 更多详情请查阅[BERT IPU](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/bert/static_ipu)，感谢 [gglin001](https://github.com/gglin001) 的贡献

更多的预训练模型
- 新增 300+ 重要模型权重，涵盖 [BERT](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/BERT/contents.html)、[GPT](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/GPT/contents.html)、[T5](https://paddlenlp.readthedocs.io/zh/latest/model_zoo/transformers/T5/contents.html) 等模型结构，目前PaddleNLP精选预训练模型数达500+
- 新增 FNet 模型 1499，感谢 [HJHGJGHHG](https://github.com/HJHGJGHHG) 的贡献
- 新增 ProphetNet 模型 1698，感谢 [d294270681](https://github.com/d294270681) 的贡献
- 新增 Megatron-LM 模型 1678，感谢 [Beacontownfc](https://github.com/Beacontownfc) 的贡献
- 新增 LUKE 模型 1677，感谢 [Beacontownfc](https://github.com/Beacontownfc) 的贡献
- 新增 RemBERT 模型 1701 ，感谢 [Beacontownfc](https://github.com/Beacontownfc) 的贡献

Trainer API

- 新增 Trainer API，简化了模型训练代码，并规范了统一的训练配置，支持VisualDL训练日志可视化，提升实验的可复现性https://github.com/PaddlePaddle/PaddleNLP/pull/1761 。Trainer API 快速上手请参考[教程](https://github.com/PaddlePaddle/PaddleNLP/blob/develop/docs/trainer.md)。

Data API

- 兼容 [HuggingFace Datasets](https://huggingface.co/datasets)，可以直接使用其 `load_dataset` 返回的数据集（建议在先`import paddlenlp`后再import datasets）
- 新增 `DataCollatorWithPadding`、`DataCollatorForTokenClassification` 等常用任务的 Data Collator，简化数据处理流程
- Tokenizer 功能新增与调整：
- 支持自定义 special token 的保存和加载
- 提供更丰富的 Padding 方式，包括定长 Pad、Longest Pad 以及 Pad 到特定倍数
- 支持获取最长单句输入长度和句对输入长度
- 支持返回 Paddle Tensor 数据
- **IMPORTANT NOTE** 在输入为 batch 数据时，默认输出格式由 list of dict 调整为 dict of list （dict 为`BatchEncoding`类的对象），可通过 `return_dict` 设置
- **IMPORTANT NOTE** `save_pretrained` 保存内容格式有调整（保证了兼容性，此前保存内容仍能正常使用）

BugFix

- 修复Taskflow NPTag 解码问题 2023
- 修复语义检索 Application 召回模型训练 output_emb_size = 0 时报错问题 2090

Breaking Changes

- 调用 Tokenizer 在输入为 batch 数据时，默认输出格式由 list of dict 调整为 dict of list （dict 为`BatchEncoding`类的对象），可通过 `return_dict` 设置

New Contributors
* mmglove made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/1974
* luyaojie made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2012
* wjj19950828 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2061
* kev123456 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2070
* heliqi made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2073
* yeliang2258 made their first contribution in https://github.com/PaddlePaddle/PaddleNLP/pull/2077

**Full Changelog**: https://github.com/PaddlePaddle/PaddleNLP/compare/v2.2.6...v2.3.0

Page 6 of 9

Releases

Has known vulnerabilities

Previous Next

Paddlenlp

Page 6 of 9

2.3.5

2.3.4

2.3.3

2.3.2

2.3.1

2.3.0

Page 6 of 9

Links

Releases