Modelscope

Latest version: v1.20.1

Safety actively analyzes 682404 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 6 of 7

1.5.0

中文版本

新模型推荐
| 序号 | 模型名称&快捷链接 |
| --- | --- |
| 1 | [ResNet50行人结构化属性识别模型](https://modelscope.cn/models/damo/cv_resnet50_pedestrian-attribute-recognition_image/summary) |
| 2 | [DamoFD人脸检测关键点模型-0.5G](https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd/summary) |
| 3 | [CAM++说话人确认-英文-VoxCeleb-16k](https://modelscope.cn/models/damo/speech_campplus_sv_en_voxceleb_16k/summary) |
| 4 | [一种具有自我评估能力的机器翻译-中英-通用领域-large](https://modelscope.cn/models/damo/nlp_canmt_translation_zh2en_large/summary) |

高亮功能

- 支持 lora 生成扩散模型高效调优
- 增加 llama 模型
- 支持推送到 hub 的能力
- 为 chatglm-6B 类模型支持 chat 任务
- 增加常用模型和任务的 cli 调用 example

功能列表

- 支持了对使用 megatron tensor 并行模型保存的 checkpoint 拆分合并
- 支持 lora 生成扩散模型高效调优
- 增加 pedestrian attribute recognition 模型
- 增加 damofd 系列模型
- 增加 llama 模型
- 支持推送到 hub 的能力
- 增加 speaker cam++ 模型
- 增加 head 支持 XlmRoberta 模型
- 增加 canmt translation 模型
- 为 chatglm-6B 类模型支持 chat 任务

功能提升

- funasr 更新到 0.4.0 版本,支持 mac 运行
- plugin 支持 trainer
- fid_dialouge_pipeline 新增 3.7B 模型
- 增加 Mgeo 模型 token classification 任务的训练示例
- 增加 PALM 模型 text generation 任务的训练示例
- 增加 CLIP 模型 multi-modal embedding 任务的训练示例
- speech kws nearfield 训练增加梯度累积配置
- 重构优化人脸重建模型相关代码
- 更新图像着色指标
- 更新 github issue 模版

BugFix

- 修复文本生成任务模型 generate 报错
- 修复人脸重建模型 pipeline 报错
- 修复 pipeline 重复输出 warning 的问题
- 修复 plugin import 包失败时报错
- 修复 speech kws nearfield 多卡训练报错
- 修复生成模型输出英文结果缺少空格的问题
- 修复 jsonplus 不支持 ndarray 的问题

English Version

New Model List and Quick Access

| No | Model Name & Link |
| --- | --- |
| 1 | [ResNet50 pedestrian-attribute-recognition image](https://modelscope.cn/models/damo/cv_resnet50_pedestrian-attribute-recognition_image/summary) |
| 2 | [DamoFD face-detection 0.5G](https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd/summary) |
| 3 | [Speech cam++ English-VoxCeleb-16k](https://modelscope.cn/models/damo/speech_campplus_sv_en_voxceleb_16k/summary) |
| 4 | [Canmt translation with self evaluation zh2en-large](https://modelscope.cn/models/damo/nlp_canmt_translation_zh2en_large/summary) |

Highlight

- Add efficient tunner modules
- Add llama to mslib from hf
- Support the ability to push to hub
- Add task chat for all chat models, like chatglm-6B
- Add common models and tasks cli call example

Breaking changes

Feature
- Support split and merge for megatron_base model
- Add efficient tunner modules
- Add pedestrian attribute recognition model
- Add damofd model
- Add llama to mslib from hf
- Support the ability to push to hub
- Add speaker model cam++ for speaker verification task
- New head support for XlmRoberta model
- Add canmt translation model
- Add task chat for all chat models, like chatglm-6B

Improvements

- support funasr for mac
- Plugin support trainer
- Add 3.7B size model for fid_dialouge_pipeline
- Add token classification example for MGeo
- Add PALM finetune example
- Add multi-modal embedding example for CLIP
- Speech kws nearfield training add gradient accumulation config
- Update face reconstruction to HRN(CVPR2023)
- Update image colorization metric
- Update issue templates

BugFix

- Fix generate for ModelForTextGeneration
- Fix issues for face pipeline
- Fix keep printing warnings in pipeline
- Bug fixed in plugin
- Fix speech kws nearfield training with multi-gpu
- Fix english words without space
- Fix jsonplus, support ndarray

1.4.1

中文版本

新模型推荐
| 序号 | 模型名称&快捷链接 | 贡献组织 | 是否支持finetune |
| --- | --- | --- | --- |
| 1 | [ChatGLM-中英对话大模型-6B](https://modelscope.cn/models/ZhipuAI/ChatGLM-6B/summary) | 智谱.AI | |
| 2 | [GLM130B-中英大模型](https://modelscope.cn/models/ZhipuAI/GLM130B/summary) | 智谱.AI | |
| 3 | [unidiffuser-v1](https://modelscope.cn/models/thu-ml/unidiffuser-v1/summary) | 清华TSAIL | |
| 4 | 元语功能型对话大模型v2 | 元语智能 | |
| 5 | [盘古α 2.6B](https://www.modelscope.cn/models/OpenICommunity/pangu_2_6B/summary) | 鹏城实验室 | |
| 6 | [openjourney](https://modelscope.cn/models/dienstag/openjourney/summary) | 个人开发者-dienstag | |
| 7 | [Rwkv-4-pile-14b](https://modelscope.cn/models/Blink_DL/rwkv-4-pile-14b/summary) | 个人开发者-Blink\_DL | |
| 8 | [SiameseUIE通用信息抽取-中文-base](https://modelscope.cn/models/damo/nlp_structbert_siamese-uie_chinese-base/summary) | | |
| 9 | [SiameseUniNLU零样本通用自然语言理解-中文-base · 模型库 (modelscope.cn)](https://modelscope.cn/models/damo/nlp_structbert_siamese-uninlu_chinese-base/summary) | | |

高亮功能

- 外部repo可以以插件形式和modelscope库协同工作
- 增加SCRFD模型的onnx导出
- 增加damoyolo模型的onnx导出
- 支持序列标注模型的onnx/torchscript导出
- 支持cartoon模型的pb文件导出
- 重构taskdataset模块,用户现在可定制自己的数据集逻辑了
- 增加text-generation任务的examples,同样适用于GPT3
- Siamese uie模型支持finetune

功能列表
- 推理和训练中支持torch2.0 compile,注意因为测试尚不充分因此有些模型可能遇到错误
- Add adadet库的trainer支持
- ddcolor image colorization支持训练
- 增加video_instance_segmentation推理能力
- 增加CLI工具的插件能力
- 增加human reconstruction任务
- 增加vidt模型
- 增加speech_timestamp任务
- 增加disco guided diffusion模型
- ocr_reco_crnn支持训练
- 增加action detection的训练
- 增加ocr_detection_db的训练
- 增加 lore lineness table recognition任务
- 增加PEER模型
- 增加damoyolo的烟雾探测模型
- 增加RLEG模型
- 增加用于视频实例追踪的ProContEXT模型
- 增加视频感知模型longshortnet
- 增加dingding去噪模型
- 支持vision efficient tuning
- 支持text-to-video-synthesis任务

功能提升

- text-generation推理支持args输入
- 在video temporal grounding中支持soonet
- trainer支持DDPHook
- Kws支持继续训练能力
- 支持GPU上的正确DDIM采样能力
- 增加更多的CLI工具
- 修改语音推理的输入输出
- 优化kws的配置
- ImagePaintbyexamplePipeline支持demoservice
- 支持easycv trainer的load_from


BugFix

- 修复安装detecron2的报错
- 修复generate_scp_from_url方法的报错
- 修复speaker_verification_pipeline和speaker_diarization_pipeline
- 修复data releate case失败的问题
- 修复ast扫描失败的问题
- 修复Word alignment预处理器的bug

English Version

New Model List and Quick Access

| No | Model Name & Link | Org | Finetune supported |
| --- | --- | --- | --- |
| 1 | [ChatGLM-English&Chinese-6B](https://modelscope.cn/models/ZhipuAI/ChatGLM-6B/summary) | ZhiPu.AI | |
| 2 | [GLM130B-LLM English&Chinese](https://modelscope.cn/models/ZhipuAI/GLM130B/summary) | ZhiPu.AI | |
| 3 | [unidiffuser-v1](https://modelscope.cn/models/thu-ml/unidiffuser-v1/summary) | TsingHua TSAIL | |
| 4 | ChatYuan-large-v2 | YuanYu | |
| 5 | [OpenICommunity/pangu_2_6B](https://www.modelscope.cn/models/OpenICommunity/pangu_2_6B/summary) | PengCheng Lab | |
| 6 | [openjourney](https://modelscope.cn/models/dienstag/openjourney/summary) | personal-dienstag | |
| 7 | [Rwkv-4-pile-14b](https://modelscope.cn/models/Blink_DL/rwkv-4-pile-14b/summary) | personal-Blink\_DL | |
| 8 | [SiameseUIE information extraction-Chinese-base](https://modelscope.cn/models/damo/nlp_structbert_siamese-uie_chinese-base/summary) | | |
| 9 | [SiameseUniNLU zero-shot NLU Chinese base model](https://modelscope.cn/models/damo/nlp_structbert_siamese-uninlu_chinese-base/summary) | | |

Highlight

- Support repos work with modelscope library via plugin
- Support onnx export for SCRFD model
- Add onnx exporter for damoyolo
- Add onnx/torchscript exporter for token classification models
- Add frozen graph def exporter for cartoon model
- Refactor taskdataset module, user now can write datasets with custom logics
- Add example for text-generation finetuning, also available for GPT3
- Siamese uie finetune support


Breaking changes


Feature
- Support torch2.0 compile in inference and training, this feature is not stable on all models
- Add ADADET && thirdparty arg for damoyolo trainer
- Add finetune for ddcolor image colorization
- Add video_instance_segmentation pipeline
- Add plugin with cli tool
- Add human reconstruction task
- Add vidt model
- Add task: speech_timestamp
- Add disco guided diffusion
- Add training support for ocr_reco_crnn
- Add action detection finetune
- Add ocr_detection_db training module
- Add lore lineness table recognition
- Add PEER model
- Add smoke and fire detection model using damoyolo
- Add generative multimodal embedding model RLEG
- Add vop_se for text video retrival
- Add ProContEXT model for video single object tracking
- Add video streaming perception models longshortnet
- Add dingding denoise model
- Support vision efficient tuning finetune
- Add text-to-video-synthesis
- Add MAN for image-quality-assessment



Improvements

-Support run text generation pipeline with args
- Add soonet for video temporal grounding
- Trainer support parallel_groups setting and DDP hook
- Kws support continue training from a checkpoint
- Correct DDIM sampling on GPU
- Add more cli tools
- Modify audio input types && punc postprocess
- Optimize kws pipeline and training conf
- Support ImagePaintbyexamplePipeline demo service
- Support load_from for easycv trainer


BugFix

- Fix bug for install detecron2
- Fix bug for modify function generate_scp_from_url
- Fix bug for speaker_verification_pipeline and speaker_diarization_pipeline: re-write the default config with configure.json
- Fix bug for data releate case failed bug
- Fix bug for ast scan funcitondef
- Word alignment preprocessor fix

1.3.2

中文版本

新模型列表及快捷访问

该小版本共新增上架6个模型,其中新增2个模型支持finetune能力。

| 序号 | 模型名称&链接 | 支持finetune |
| --- | --- | --- |
| 1 | [ControlNet可控图像生成](https://modelscope.cn/models/dienstag/cv_controlnet_controllable-image-generation_nine-annotators/summary) | |
| 2 | [兰丁宫颈细胞AI辅助诊断模型](https://modelscope.cn/models/landingAI/LD_CytoBrainCerv/summary) | |
| 3 | [读光-文字检测-DB行检测模型-中英-通用领域](https://modelscope.cn/models/damo/cv_resnet18_ocr-detection-db-line-level_damo/summary) | |
| 4 | [SOND说话人日志-中文-alimeeting-16k-离线-pytorch](https://www.modelscope.cn/models/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/summary) | |
| 5 | [NeRF快速三维重建模型](https://www.modelscope.cn/models/damo/cv_nerf-3d-reconstruction-accelerate_damo/summary) | **√** |
| 6 | [DCT-Net人像卡通化](https://modelscope.cn/models/damo/cv_unet_person-image-cartoon_compound-models/summary) | **√** |


Feature
* GPT3 Finetune功能完善,支持DDP+tensor parallel, finetune流程串接推理流程优化
* checkpoint保存逻辑优化,确保周期性保存和最优保存的文件可以直接用于推理
* Hooks方案重构,解耦各个功能hook,支持hooks间交互
* 支持ImagePaintbyExamplePipeline demo service
* 支持多种音频类型
* 支持Petr3D CPU推理支持兼容新版mmcv
* deberta v2 预处理器更新
* 支持NLP下游任务模型初始化仅加载backbone预训练权重
* 更新librosa.resample()参数支持最新版本
* 添加下游工具箱调用埋点统计功能

不兼容行问题
* checkpoint保存分拆了模型参数和训练状态参数,老版本的模型参数需要转换后加载

问题修复:

* 修复asr vad/lm/punc输入处理
* 修复gpt moe finetune checkpoint path error
* 修复args lm_train_conf is invalid
* 修复删除已有文件ci测试报错
* 修复OCR识别bug
* 移除preprocessing stage中图像分辨率的限制
* 修复输出wav文件是32-bit float而不是预期的16-bit int
* 设置num_workers=0,以防止在demo-service中创建子进程



English Version

New Model List and Quick Access
This minor version adds a total of six new models, including two models with finetuning capability.

| **No.** | **Model Name & Link** | **Finetuning Supported** |
| --- | --- | --- |
| 1 | [ControlNet Controllable Image Generation](https://modelscope.cn/models/dienstag/cv_controlnet_controllable-image-generation_nine-annotators/summary) | |
| 2 | [Landing AI Cervical Cell AI-assisted Diagnosis Model](https://modelscope.cn/models/landingAI/LD_CytoBrainCerv/summary) | |
| 3 | [Reading Light - Text Detection - DB Row Detection Model - Chinese and English - General Domain](https://modelscope.cn/models/damo/cv_resnet18_ocr-detection-db-line-level_damo/summary) | |
| 4 | [SOND Speaker Diary - Chinese - Alimeeting-16k - Offline - PyTorch](https://www.modelscope.cn/models/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/summary) | |
| 5 | [NeRF Fast 3D Reconstruction Model](https://www.modelscope.cn/models/damo/cv_nerf-3d-reconstruction-accelerate_damo/summary) | **√** |
| 6 | [DCT-Net Person Image Cartoonization](https://modelscope.cn/models/damo/cv_unet_person-image-cartoon_compound-models/summary) | **√** |

Features
- GPT-3 finetune has been improved to support DDP+tensor parallel
- Checkpoint saving logic has been optimized to ensure that files saved periodically and those saved as the best can be used directly by pipeline
- The Hooks scheme has been refactored to decouple various functional hooks and support interaction between hooks.
- Supports ImagePaintbyExamplePipeline demo service
- Supports multi-machine data and tensor parallel finetuning for cartoon task
- Supports various audio types
- Supports Petr3D CPU inference with compatibility for the latest version of mmcv
- Updates deberta v2 preprocessor
- Supports initialization of downstream NLP task models with only backbone pre-training weights loaded
- Updates librosa.resample() parameter support to the latest version
- Adds downstream toolbox call tracking function

Break changes
- Saving model parameters and training state seperately, so previous trained checkpoints should be converted before resume training

Bug Fixes:

- Fixes asr vad/lm/punc input processing
- Fixes gpt moe finetune checkpoint path error
- Fixes args lm_train_conf is invalid
- Fixes ci test errors when deleting existing files
- Fixes OCR recognition bugs
- Removes image resolution restrictions in preprocessing stage
- Fixes output wav file being 32-bit float instead of expected 16-bit int
- Sets num_workers=0 to prevent creating sub-processes in demo-service.

1.3.0

新模型列表及快捷访问

| 序号 | 模型名称&链接 | 支持finetune |
| --- | --- | --- |
| 1 | [NAFNet图像去模糊](https://modelscope.cn/models/damo/cv_nafnet_image-deblur_gopro/summary) | **√** |
| 2 | [BEiTv2图像分类-通用-base](https://modelscope.cn/models/damo/cv_beitv2-base_image-classification_patch16_224_pt1k_ft22k_in1k/summary) | **√** |
| 3 | [BEiTv2图像分类-通用-large](https://modelscope.cn/models/damo/cv_beitv2-large_image-classification_patch16_224_pt1k_ft22k_in1k/summary) | **√** |
| 4 | [实时人头检测-通用](https://www.modelscope.cn/models/damo/cv_tinynas_head-detection_damoyolo/summary) | √ |
| 5 | [实时手机检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_phone/summary) | √ |
| 6 | [NAFNet图像去模糊压缩](https://modelscope.cn/models/damo/cv_nafnet_image-deblur_reds/summary) | √ |
| 7 | [DINO-高精度目标检测模型](https://modelscope.cn/models/damo/cv_swinl_image-object-detection_dino/summary) | √ |
| 8 | [StructBERT文本相似度-中文-电商-base](https://www.modelscope.cn/models/damo/nlp_structbert_sentence-similarity_chinese-retail-base/summary) | √ |
| 9 | [StructBERT事实准确性检测-中文-电商-base](https://www.modelscope.cn/models/damo/nlp_structbert_fact-checking_chinese-base/summary) | √ |
| 10 | [StructBERT FAQ问答-中文-金融领域-base](https://www.modelscope.cn/models/damo/nlp_structbert_faq-question-answering_chinese-finance-base/summary) | √ |
| 11 | [StructBERT FAQ问答-中文-政务领域-base](https://www.modelscope.cn/models/damo/nlp_structbert_faq-question-answering_chinese-gov-base/summary) | √ |
| 12 | [IR人脸识别模型FRIR](https://modelscope.cn/models/damo/cv_manual_face-recognition_frir/summary) | |
| 13 | [口罩人脸识别模型FRFM-large](https://modelscope.cn/models/damo/cv_manual_face-recognition_frfm/summary) | |
| 14 | [人脸质量模型FQA](https://modelscope.cn/models/damo/cv_manual_face-quality-assessment_fqa/summary) | |
| 15 | [静默人脸活体检测模型-炫彩](https://modelscope.cn/models/damo/cv_manual_face-liveness_flxc/summary) | |
| 16 | [运动生成-人体运动-英文](https://www.modelscope.cn/models/damo/cv_mdm_motion-generation/summary) | |
| 17 | [M2FP单人人体解析](https://www.modelscope.cn/models/damo/cv_resnet101_image-single-human-parsing/summary) | |
| 18 | [DeOldify视频上色](https://modelscope.cn/models/damo/cv_unet_video-colorization/summary) | |
| 19 | [图像质量MOS评估](https://www.modelscope.cn/models/damo/cv_resnet_image-quality-assessment-mos_youtubeUGC/summary) | |
| 20 | [异常图像检测](https://www.modelscope.cn/models/damo/cv_mobilenet-v2_bad-image-detecting/summary) | |
| 21 | [YOLOPV2车辆检测车道线分割-自动驾驶领域](https://www.modelscope.cn/models/damo/cv_yolopv2_image-driving-perception_bdd100k/summary) | |
| 22 | [DCT-Net人像卡通化-扩散模型-插画](https://modelscope.cn/models/damo/cv_unet_person-image-cartoon-sd-design_compound-models/summary) | |
| 23 | [DCT-Net人像卡通化-扩散模型-漫画](https://modelscope.cn/models/damo/cv_unet_person-image-cartoon-sd-illustration_compound-models/summary) | |
| 24 | [卡通系列文生图模型](https://modelscope.cn/models/damo/cv_cartoon_stable_diffusion_design/summary) | |
| 25 | [卡通系列文生图模型-漫画风](https://modelscope.cn/models/damo/cv_cartoon_stable_diffusion_illustration/summary) | |
| 26 | [卡通系列文生图模型-水彩风](https://modelscope.cn/models/damo/cv_cartoon_stable_diffusion_watercolor/summary) | |
| 27 | [卡通系列文生图模型-剪贴画](https://modelscope.cn/models/damo/cv_cartoon_stable_diffusion_clipart/summary) | |
| 28 | [卡通系列文生图模型-扁平风](https://modelscope.cn/models/damo/cv_cartoon_stable_diffusion_flat/summary) | |
| 29 | [轻量级SRResNet视频超分辨率](https://www.modelscope.cn/models/damo/cv_msrresnet_video-super-resolution_lite/summary) | |
| 30 | [ECBSR端上图像超分模型](https://www.modelscope.cn/models/damo/cv_ecbsr_image-super-resolution_mobile/summary) | |
| 31 | [实时交通标识检测-自动驾驶领域](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_traffic_sign/summary) | |
| 31 | [多尺度局部平面引导的单目深度估计](https://modelscope.cn/models/damo/cv_densenet161_image-depth-estimation_bts/summary) | |
| 33 | [uhdm图像去摩尔纹](https://modelscope.cn/models/damo/cv_uhdm_image-demoireing/summary) | |
| 34 | [M2FP多人人体解析](https://www.modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing/summary) | |
| 35 | [VFI-RAFT视频插帧-应用型](https://modelscope.cn/models/damo/cv_raft_video-frame-interpolation_practical/summary) | |
| 36 | [StableDiffusionV2图像填充](https://www.modelscope.cn/models/damo/cv_stable-diffusion-v2_image-inpainting_base/summary) | |
| 37 | [MT5开放域多轮对话改写-中文-通用-base](https://www.modelscope.cn/models/damo/nlp_mt5_dialogue-rewriting_chinese-base/summary) | |
| 38 | [基础视觉模型高效调优-adapter](https://modelscope.cn/models/damo/cv_vitb16_classification_vision-efficient-tuning-adapter/summary) | |
| 39 | [基础视觉模型高效调优-prompt](https://modelscope.cn/models/damo/cv_vitb16_classification_vision-efficient-tuning-prompt/summary) | |
| 40 | [基础视觉模型高效调优-prefix](https://modelscope.cn/models/damo/cv_vitb16_classification_vision-efficient-tuning-prefix/summary) | |
| 41 | [基础视觉模型高效调优-lora](https://modelscope.cn/models/damo/cv_vitb16_classification_vision-efficient-tuning-lora/summary) | |
| 42 | [视频全景分割-VideoKNet-SwinB](https://www.modelscope.cn/models/damo/cv_swinb_video-panoptic-segmentation_vipseg/summary) | |
| 43 | [人脸重建模型](https://modelscope.cn/models/damo/cv_resnet50_face-reconstruction/summary) | |
| 44 | [DDPM-Seg基于扩散模型的语义分割](https://www.modelscope.cn/models/damo/cv_diffusion_image-segmentation/summary) | |
| 45 | [DeepLPF图像调色](https://modelscope.cn/models/damo/cv_deeplpfnet_image-color-enhance-models/summary) | |
| 46 | [视频去场纹](https://www.modelscope.cn/models/damo/cv_unet_video-deinterlace/summary) | |
| 47 | [Adaptive-Interval-3DLUT图像调色](https://modelscope.cn/models/damo/cv_adaint_image-color-enhance-models/summary) | |
| 48 | [RealESRGAN图像去色带](https://modelscope.cn/models/damo/cv_rrdb_image-debanding/summary) | |
| 49 | [图像画质损伤分析](https://www.modelscope.cn/models/damo/cv_resnet50_image-quality-assessment_degradation/summary) | |
| 50 | [基于视觉和语言的知识蒸馏的开放词汇目标检测](https://www.modelscope.cn/models/damo/cv_resnet152_open-vocabulary-detection_vild/summary) | |
| 51 | [针对长尾/小目标问题的高性能通用目标检测](https://modelscope.cn/models/damo/cv_resnet50_object-detection_maskscoring/summary) | |


 最佳实践教程

最后,我们还推出许多任务级别和模型级别的最佳实践教程文档,旨在帮助开发者更好地理解和应用模型。

* [NLG大模型的使用介绍](https://modelscope.cn/docs/NLG%E5%A4%A7%E6%A8%A1%E5%9E%8B%E4%BD%BF%E7%94%A8%E4%BB%8B%E7%BB%8D)

* [序列标注任务最佳实践](https://modelscope.cn/docs/%E5%BA%8F%E5%88%97%E6%A0%87%E6%B3%A8%E4%BB%BB%E5%8A%A1)

* [文本生成任务最佳实践](https://modelscope.cn/docs/%E6%96%87%E6%9C%AC%E7%94%9F%E6%88%90%E4%BB%BB%E5%8A%A1)

* [文本分类任务最佳实践](https://modelscope.cn/docs/%E6%96%87%E6%9C%AC%E5%88%86%E7%B1%BB%E4%BB%BB%E5%8A%A1)

* [图像分割任务最佳实践](https://modelscope.cn/docs/%E5%9B%BE%E5%83%8F%E5%88%86%E5%89%B2%E4%BB%BB%E5%8A%A1)


欢迎关注我们的开源社区:[https://github.com/modelscope/modelscope](https://github.com/modelscope/modelscope)


English Version
Highlight

- Add vqa-degradation
- Add content check pipeline
- Add pipelines for en2zh-imt and zh2en-imt
- Add single and multiple human parsing models
- Add AdaInt model
- Add open vocabulary detection
- Support finetune for sentence-embedding
- Add bad image detection model and pipeline
- Support translation model exporting
- Add asr dataset for finetune
- Add ocr detection model and pipeline
- Add face quality assessment model
- Add video deinterlace model
- Add language model for audio task
- Add deeplpf for image color enhance and image debanding model
- Add ecbsr model for mobile image super-resolution
- Add msrresnetlite model for video super-resolution
- Support finetune and evaluation for image-fewshot-detection-defrcn
- Add yolopv2 model cv_yolopv2_image_driving_perception
- Add face liveness xc model
- Add paint-by-example model
- Add universal_matting pipeline
- Add multi-modal_gridvlp_classification_chinese-base-ecom-cate
- Add DINO detection with easycv
- Add speech speaker verification pipeline
- Add nerf-recon model
- Support finetune for real-time object detection with easycv
- Add single-camera depth estimation bts model
- Add MGIMN model
- Add fuse-in-decoder dialogue task
- Add vision_efficient_tuning models
- Add traffic-sign detection
- Add object_detection3d_depe model
- Add stable diffusion model for image inpainting
- Add head&phone detection models
- Add face_reconstruction model
- Add structured model probing pipeline for image classification
- Add video panorama segmentation with VideoKNet-SwinB
- Add image quality assessment mos(mean option score) model
- Add ddpm-segmentation pipeline
- Add plug mental model
- Add video-colorization pipeline
- Add image demoireing
- Add face recognition ir model
- Support batch inference for nlp_csanmt_translation_en2zh
- Add image_deblurring_dataset for REDS dataset
- Add new motion-generation model
- Add face recognition and face mask model


Breaking changes

- Adjust video_multi_target_tracking output
- Adjust video_human_matting output of video to support demo service


Feature

- Add default preprocessor for taskmodels
- Run ci cases base on code diff to reduct ci test time
- Support demo code to return path of result video for video human matting
- Add en2ru and ru2en pipeline ut v4
- Kws pipeline returns Chinese charactor by configuration
- Ast-scanning skip function level imports index
- Compatible with diffusers0.12.1
- Video depth estimation support cpu mode
- asr pipeline add output_dir parame
- Add RTS face recognition ood model
- Add image-defrcn-fewshot-detection

Improvements

- Remove requirements of mpi4py
- Remove pytorch-lightning version constrain
- Refine cv_image_defrcn trainer to avoid failed
- Support trainer prediction
- Allow pass prompt in kwargs & reduce GPU usage for image_inpainting
- Improve video frame interpolation pipeline
- Use package LoadImage for image io in image_quality_assessment_mos
- Remove text2sql_lgesql from nlp requirements
- Remove tensorboard hook as default
- Add model_revision parameter to ImageDetectionDamoyoloTrainer
- Update mgeo finetune test case for rerank
- Add args for asr_infer_pipeline, punc_pipeline, sv_pipeline & modify funasr version
- Add model type check and give easy-to-understand error prompts
- Replace `import torchaudio` to avoid unnecessary requirements in framework
- Split training and evaluating code for nearfield kws trainer
- Update test image for image deblur
- Add output_dir for asr inferencewhen called
- Support cpu mode for video depth estimation
- Add zhconv to nlp requirements
- Modify the resumable cache path for oss utils
- Support the form of '/to/path/abc.csv' in MsDataset.load() function
- Add UT cases
- Limit pyarrow version


BugFix

- Fix bug in speaker verification infer
- Fix bug when add use_fast for text_ranking
- Fix two ckpt hooks save in the same dir
- Fix the bug that image_color_enhance_pipeline cannot run in CPU environment
- Fix bugs in audio fs, asr & sv demo services
- Fix gpt3 unexpected spaces
- Fix loading checkpoint errors for palm
- Fix data parallel bug for mgeo evaluation
- Fix the checkpoint is incompletely saved with tensor model parallel
- Fix _eval_iters_per_epoch None bug
- Fix damoyolo evaluater load checkpoint not matched
- Fix video matting demo format (mp4v to h264)
- Fix delete model revision
- Fix datasets version incompatible issue
- Fix the compatibility issue of datasets
- Fix postprocessor bugs with batch inference
- Fix asr backward compatibility during inference with tensorflow
- Fix for hand detect finetune
- Fix typos

1.2.1

中文说明
* 语音领域依赖拆分为子领域,减少依赖安装
* 语音唤醒增加返回中文配置支持
* funasr版本升级 & 语音识别、说话人确认、标点预测增加额外参数配置
* 移除基础框架对torchaudio的依赖

English

* separate audio requirements
* kws pipeline returns Chinese charactor by configuration
* add args for asr_infer_pipeline, punc_pipeline, sv_pipeline & modify funasr version
* re-place `import torchaudio` to avoid unnecessary requirements in framework

1.2.0

中文版本

该版本共新增上架38个模型,其中14个模型支持finetune能力。

模型功能特性说明

* **高性能检测热门应用系列,** 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架[DAMOYOLO](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo/summary),**新增**[实时口罩检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_facemask/summary)、[实时安全帽检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_safety-helmet/summary)、[实时人体检测模型](https://modelscope.cn/models/damo/cv_tinynas_human-detection_damoyolo/summary)、[实时香烟检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_cigarette/summary)上线,提供开箱即用的高效体验

* 语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune

* 语音合成,新增方言模型四川话、广东粤语与上海话,新增俄语与韩语外语模型

* SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型

* SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia, 方言广东话女声模型

* SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda, 方言上海话女声模型

* SambertHifigan语音合成-俄语-通用领域-16k-发音人masha, 俄语女声模型

* SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong, 韩语女声模型

* 语音文件后处理

* 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型

* 图像人脸融合

* 自动进行人脸区域提取&对齐,并完成面部特征提取,无需额外预处理。

* 引入3D重建网络对脸型进行拟合迁移,使得融合后的脸型相似度更高。

* 人脸人体

* GPEN人像增强修复-大分辨率人脸,基于GPEN框架,收集超大分辨率人脸数据训练的1024和2048模型。

* 视觉编辑

* DDColor图像上色,相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。

* VFI-RAFT视频插帧,和其它SOTA模型相比,在大运动和重复纹理场景下有较好的插帧效果。

* DUT-RAFT视频稳像,对多种视频抖动都有稳定的去抖效果,相比原生DUT,能够更好地保持视频清晰度。

* 底层视觉

* RealBasicVSR视频超分辨率,对于大部分真实场景的视频超分辨率效果良好,对于小部分降质十分严重的情况可能表现不佳。

非兼容性修改
* 文图生成任务输出类型改为多图输出
* 语音合成任务输出数据从output_pcm改为output_wav

新模型列表及快捷访问

| **贡献组织** | **模型名称** |
| --- | --- |
| **哔哩哔哩** | [RealCUGAN图像超分辨率](https://modelscope.cn/models/bilibili/cv_bilibili_image-super-resolution/summary) |
| **元语智能** | [元语功能型对话大模型](https://modelscope.cn/models/ClueAI/ChatYuan-large/summary) |
| **封神榜** | [闻仲-GPT2-110M-中文-v2](https://modelscope.cn/models/Fengshenbang/Wenzhong-GPT2-110M-chinese-v2/summary) |
| **封神榜** | [二郎神-RoBERTa-330M-文本相似度](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-330M-Similarity/summary) |
| **封神榜** | 二[郎神-RoBERTa-110M-自然语言推理](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-110M-NLI/summary) |
| **封神榜** | [二郎神-RoBERTa-330M-文本相似度](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-330M-Similarity/summary) |
| **阿里巴巴AAIG** | [离散对抗训练ViT-H/14-鲁棒图像分类-imagenet1k](https://modelscope.cn/models/AAIG/easyrobust-models/summary) |
| **阿里云机器学习平台PAI** | [GPT-MoE中文67亿诗歌生成模型](https://modelscope.cn/models/PAI/nlp_gpt3_text-generation_0.35B_MoE-64/summary) |
| **阿里云机器学习平台PAI** | [GPT-MoE中文270亿作文生成模型](https://modelscope.cn/models/PAI/nlp_gpt3_text-generation_1.3B_MoE-64/summary) |
| **达摩院** | [读光-文字检测-单词检测模型-英文-VLPT预训练](https://modelscope.cn/models/damo/cv_resnet50_ocr-detection-vlpt/summary) |
| **达摩院** | [读光-文档理解-文档理解多模态预训练模型](https://modelscope.cn/models/damo/multi-modal_convnext-roberta-base_vldoc-embedding/summary) |
| **达摩院** | [中文StableDiffusion-通用领域](https://modelscope.cn/models/damo/multi-modal_chinese_stable_diffusion_v1.0/summary) |
| **达摩院** | [DDColor图像上色](https://modelscope.cn/models/damo/cv_ddcolor_image-colorization/summary) |
| **达摩院** | [视频多目标跟踪-行人](https://modelscope.cn/models/damo/cv_yolov5_video-multi-object-tracking_fairmot/summary) |
| **达摩院** | [MaskDINO-SwinL图像实例分割](https://modelscope.cn/models/damo/cv_maskdino-swin-l_image-instance-segmentation_coco/summary) |
| **达摩院** | [VFI-RAFT视频插帧](https://modelscope.cn/models/damo/cv_raft_video-frame-interpolation/summary) |
| **达摩院** | [DUT-RAFT视频稳像](https://modelscope.cn/models/damo/cv_dut-raft_video-stabilization_base/summary) |
| **达摩院** | [RealBasicVSR视频超分辨率](https://modelscope.cn/models/damo/cv_realbasicvsr_video-super-resolution_videolq/summary)  |
| **达摩院** | [GPEN人像增强修复-大分辨率人脸](https://modelscope.cn/models/damo/cv_gpen_image-portrait-enhancement-hires/summary) |
| **达摩院** | [YOLOX-PAI手部检测模型](https://modelscope.cn/models/damo/cv_yolox-pai_hand-detection/summary) |
| **达摩院** | [ConvNeXt图像分类-中文-垃圾分类](https://modelscope.cn/models/damo/cv_convnext-base_image-classification_garbage/summary) |
| **达摩院** | [BNext二值化图像分类-英文-通用-small](https://modelscope.cn/models/damo/cv_bnext-small_image-classification_ImageNet-labels/summary) |
| **达摩院** | [实时口罩检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_facemask/summary) |
| **达摩院** | [实时安全帽检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_safety-helmet/summary) |
| **达摩院** | [实时香烟检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_cigarette/summary) |
| **达摩院** | [人脸活体检测模型](https://modelscope.cn/models/damo/cv_manual_face-liveness_flrgb/summary) |
| **达摩院** | [人脸活体检测模型-IR](https://modelscope.cn/models/damo/cv_manual_face-liveness_flir/summary) |
| **达摩院** | [MGeo多任务多模态地址预训练底座-中文-base](https://modelscope.cn/models/damo/mgeo_backbone_chinese_base/summary) |
| **达摩院** | [MaSTS预训练模型-中文-搜索-CLUE语义匹配-large](https://modelscope.cn/models/damo/nlp_masts_backbone_clue_chinese-large/summary) |
| **达摩院** | [MaSTS文本相似度-中文-搜索-CLUE语义匹配-large](https://modelscope.cn/models/damo/nlp_masts_sentence-similarity_clue_chinese-large/summary) |
| **达摩院** | [NestedNER命名实体识别-中文-医疗领域-base](https://modelscope.cn/models/damo/nlp_nested-ner_named-entity-recognition_chinese-base-med/summary) |
| **达摩院** | [CoROM文本向量-中文-电商领域-base](https://modelscope.cn/models/damo/nlp_corom_sentence-embedding_chinese-base-ecom/summary) |
| **达摩院** | [CoROM语义相关性-中文-电商领域-base](https://modelscope.cn/models/damo/nlp_corom_passage-ranking_chinese-base-ecom/summary) |
| **达摩院** | [全任务零样本学习-mT5分类增强版-中文-base](https://www.modelscope.cn/models/damo/nlp_mt5_zero-shot-augment_chinese-base/summary) |
| **达摩院** | [StructBERT情绪分类-中文-七分类-base](https://www.modelscope.cn/models/damo/nlp_structbert_emotion-classification_chinese-base/summary) |
| **达摩院** | [HiTransUSE用户满意度估计-中文-电商-base](https://www.modelscope.cn/models/damo/nlp_user-satisfaction-estimation_chinese/summary) |
| **达摩院** | [UniASR语音识别-中文-通用-8k-实时-pytorch](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary) |
| **达摩院** | [Paraformer语音识别-中文-通用-16k-离线-large-pytorch](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |
| **达摩院** | [Paraformer语音识别-中文-通用-16k-离线-large-长音频版](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |
| **达摩院** | [RaNER-chunking-英文-large](https://modelscope.cn/models/damo/nlp_raner_chunking_english-large/summary) |
| **达摩院** | [mPLUG-HiTeA-视频问答模型-英文-Base](https://www.modelscope.cn/models/damo/multi-modal_hitea_video-question-answering_base_en/summary) |
| **达摩院** | [mPLUG-HiTeA-视频描述-英文-Base](https://www.modelscope.cn/models/damo/multi-modal_hitea_video-captioning_base_en/summary) |
| **达摩院** | [Mask2Former-R50全景分割](https://modelscope.cn/models/damo/cv_r50_panoptic-segmentation_cocopan/summary) |
| **达摩院** | [图像人脸融合](https://modelscope.cn/models/damo/cv_unet-image-face-fusion_damo/summary) |
| **达摩院** | [春联生成模型-中文-base](https://modelscope.cn/models/damo/spring_couplet_generation/summary) |
| **达摩院** | [GPT-3夸夸机器人-中文-large](https://modelscope.cn/models/damo/nlp_gpt3_kuakua-robot_chinese-large/summary) |
| **达摩院** | [BART文本纠错-中文-法律领域-large](https://modelscope.cn/models/damo/nlp_bart_text-error-correction_chinese-law/summary) |

English Version
Highlight
- Add finetune support for DAMO-YOLO
- Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
- Add finetune support asr, tts and kws model
- Batch inference support for nlp and ofa based multi-modal tasks
- Add high-resolution gpen model for face restoration
- Add DDColor model for image colorization
- Add VFI-RAFT model for video frame interpolation
- Add DUT-RAFT model for video stabilization
- Add RealBasicVSR model for video-super-resolution

Breaking changes

- change output of task text to image to list of images
- change output of task tts from output_pcm to output_wav


Feature

- Add easyrobust-models for image classification
- Video depth estimation support cpu mode
- asr pipeline add output_dir parame
- Add RTS face recognition ood model
- Add image-defrcn-fewshot-detection
- Add hires gpen model
- Add mgeo finetune and pipeline
- Add asr finetune & change inference
- Add quadtree image matching pipeline
- Add finetune for DAMO-YOLO
- Add FLRGB Face Liveness RGB Model
- Add speech separation finetune
- Asr inference: support new models, punctuation, vad, sv
- Add vop retrieval
- Add NAFNet Image Deblurring pipeline and finetune support
- Add megatron bert
- Add panovit-layout-estimation-pipeline
- Add vision middleware
- Add panorama_depth_estimation
- Unify token classfication model output
- Faq support finetune and multilingual
- Support stable diffusion and add DAMO chinese stable diffusion model
- Add cv-bnext-image-classification-pipeline
- Add VFI-RAFT model for video frame interpolation
- Add face changing pipeline
- Add DUT-RAFT model for video stabilization
- Update token_cls default sequence_length: 128 -> 512
- Add structure tasks for ofa: sudoku & text2sql
- Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
- Add model for multiple object tracking in video
- Add ConvNeXt model
- Add ppl metric
- Add image colorization
- Add User satisfaction estimation pipeline
- OFA finetune support configuration file
- Add vldoc model
- Add space-t trainer for finetune
- Add speech separation pipeline
- Add cv_casmvs_multi-view-depth-esimation_general
- Add finetune support for mask2former
- Add domain specific object detection models
- Add maskdino model
- Add FLIR Face Liveness Model
- Add finetune support for kws nearfield
- Add cv_pointnet2_sceneflow-estimation_general
- Add HiTeA model for VideoQA and Caption
- Add GPT-2 model
- Add real-time human detection model
- Add video depth estimation pipeline
- Add ocr-detection-vlpt-pipeline
- Add RealBasicVSR model for video-super-resolution
- Add image skychange model
- Add support for cv_rdevos_video-object-segmentation
- Support kantts infer and finetune
- Add gpt-moe model
- Add hand detection

Improvements

- Text-error-correction support batch inference
- Add beam search and pair finetune for GPT-3
- Optimize ast_index logic
- Refactor msdataset modules
- Save a video with h264 vcodec for video_super_resolution
- Enhance interface standard and refactor card_detection, face detection, tinynas object detection and image classification pipeline
- Audio pipeline Support byte input feature and refine fp implementations
- Remove opencv-python from framework requirements and remove easynlp from nlp default requirements
- GPT-3 model supports batch input
- Batch inference for all ofa models
- AST scanner prebuilt in whl to speed up import process

BugFix

- Fix best ckpt saver not actually save best ckpt error
- Fix logger file hanlder problem
- Fix missing self. (61)
- Fix: hub test suites can not parallel
- Fix loading custom cv data error (59)
- Fix saved checkpoint can't run with pipeline for gpt3
- Fix check video type cv2.VideoCapture and add unittest
- Fix a bug for plug inference
- Fix memory leak bug in eval for movie scene segmentation
- Fix useragent string and a trainer invokedby
- Fix: statistics header not correct set
- Fix timeout issue for uni-fold list_oss_objects api
- Fix card-detection model unregistered error and fix log warning
- Fix multimer input for science/protein_structure
- Fix demo service && copy license for cv/language_guided_video_summarization
- Fix file directory create error in ddp training

Page 6 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.