中文版本
该版本共新增上架38个模型,其中14个模型支持finetune能力。
模型功能特性说明
* **高性能检测热门应用系列,** 基于精度和速度均超越当前经典YOLO系列、面向工业落地的高性能检测框架[DAMOYOLO](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo/summary),**新增**[实时口罩检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_facemask/summary)、[实时安全帽检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_safety-helmet/summary)、[实时人体检测模型](https://modelscope.cn/models/damo/cv_tinynas_human-detection_damoyolo/summary)、[实时香烟检测模型](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_cigarette/summary)上线,提供开箱即用的高效体验
* 语音识别、语音合成以及语音唤醒可以基于Modelscope Python SDK进行模型finetune
* 语音合成,新增方言模型四川话、广东粤语与上海话,新增俄语与韩语外语模型
* SambertHifigan语音合成-四川话-通用领域-16k-发音人chuangirl, 方言四川话女声模型
* SambertHifigan语音合成-广东粤语-通用领域-16k-发音人jiajia, 方言广东话女声模型
* SambertHifigan语音合成-上海话-通用领域-16k-发音人xiaoda, 方言上海话女声模型
* SambertHifigan语音合成-俄语-通用领域-16k-发音人masha, 俄语女声模型
* SambertHifigan语音合成-韩语-通用领域-16k-发音人kyong, 韩语女声模型
* 语音文件后处理
* 新增英语、德语、菲律宾语、韩语、越南语、日语、俄语、印尼语、葡萄牙语、法语、西班牙等11中语言的文本规整模型
* 图像人脸融合
* 自动进行人脸区域提取&对齐,并完成面部特征提取,无需额外预处理。
* 引入3D重建网络对脸型进行拟合迁移,使得融合后的脸型相似度更高。
* 人脸人体
* GPEN人像增强修复-大分辨率人脸,基于GPEN框架,收集超大分辨率人脸数据训练的1024和2048模型。
* 视觉编辑
* DDColor图像上色,相比Deoldify等之前方法在色彩丰富度和语义贴合上大幅提升。
* VFI-RAFT视频插帧,和其它SOTA模型相比,在大运动和重复纹理场景下有较好的插帧效果。
* DUT-RAFT视频稳像,对多种视频抖动都有稳定的去抖效果,相比原生DUT,能够更好地保持视频清晰度。
* 底层视觉
* RealBasicVSR视频超分辨率,对于大部分真实场景的视频超分辨率效果良好,对于小部分降质十分严重的情况可能表现不佳。
非兼容性修改
* 文图生成任务输出类型改为多图输出
* 语音合成任务输出数据从output_pcm改为output_wav
新模型列表及快捷访问
| **贡献组织** | **模型名称** |
| --- | --- |
| **哔哩哔哩** | [RealCUGAN图像超分辨率](https://modelscope.cn/models/bilibili/cv_bilibili_image-super-resolution/summary) |
| **元语智能** | [元语功能型对话大模型](https://modelscope.cn/models/ClueAI/ChatYuan-large/summary) |
| **封神榜** | [闻仲-GPT2-110M-中文-v2](https://modelscope.cn/models/Fengshenbang/Wenzhong-GPT2-110M-chinese-v2/summary) |
| **封神榜** | [二郎神-RoBERTa-330M-文本相似度](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-330M-Similarity/summary) |
| **封神榜** | 二[郎神-RoBERTa-110M-自然语言推理](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-110M-NLI/summary) |
| **封神榜** | [二郎神-RoBERTa-330M-文本相似度](https://modelscope.cn/models/Fengshenbang/Erlangshen-RoBERTa-330M-Similarity/summary) |
| **阿里巴巴AAIG** | [离散对抗训练ViT-H/14-鲁棒图像分类-imagenet1k](https://modelscope.cn/models/AAIG/easyrobust-models/summary) |
| **阿里云机器学习平台PAI** | [GPT-MoE中文67亿诗歌生成模型](https://modelscope.cn/models/PAI/nlp_gpt3_text-generation_0.35B_MoE-64/summary) |
| **阿里云机器学习平台PAI** | [GPT-MoE中文270亿作文生成模型](https://modelscope.cn/models/PAI/nlp_gpt3_text-generation_1.3B_MoE-64/summary) |
| **达摩院** | [读光-文字检测-单词检测模型-英文-VLPT预训练](https://modelscope.cn/models/damo/cv_resnet50_ocr-detection-vlpt/summary) |
| **达摩院** | [读光-文档理解-文档理解多模态预训练模型](https://modelscope.cn/models/damo/multi-modal_convnext-roberta-base_vldoc-embedding/summary) |
| **达摩院** | [中文StableDiffusion-通用领域](https://modelscope.cn/models/damo/multi-modal_chinese_stable_diffusion_v1.0/summary) |
| **达摩院** | [DDColor图像上色](https://modelscope.cn/models/damo/cv_ddcolor_image-colorization/summary) |
| **达摩院** | [视频多目标跟踪-行人](https://modelscope.cn/models/damo/cv_yolov5_video-multi-object-tracking_fairmot/summary) |
| **达摩院** | [MaskDINO-SwinL图像实例分割](https://modelscope.cn/models/damo/cv_maskdino-swin-l_image-instance-segmentation_coco/summary) |
| **达摩院** | [VFI-RAFT视频插帧](https://modelscope.cn/models/damo/cv_raft_video-frame-interpolation/summary) |
| **达摩院** | [DUT-RAFT视频稳像](https://modelscope.cn/models/damo/cv_dut-raft_video-stabilization_base/summary) |
| **达摩院** | [RealBasicVSR视频超分辨率](https://modelscope.cn/models/damo/cv_realbasicvsr_video-super-resolution_videolq/summary) |
| **达摩院** | [GPEN人像增强修复-大分辨率人脸](https://modelscope.cn/models/damo/cv_gpen_image-portrait-enhancement-hires/summary) |
| **达摩院** | [YOLOX-PAI手部检测模型](https://modelscope.cn/models/damo/cv_yolox-pai_hand-detection/summary) |
| **达摩院** | [ConvNeXt图像分类-中文-垃圾分类](https://modelscope.cn/models/damo/cv_convnext-base_image-classification_garbage/summary) |
| **达摩院** | [BNext二值化图像分类-英文-通用-small](https://modelscope.cn/models/damo/cv_bnext-small_image-classification_ImageNet-labels/summary) |
| **达摩院** | [实时口罩检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_facemask/summary) |
| **达摩院** | [实时安全帽检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_safety-helmet/summary) |
| **达摩院** | [实时香烟检测-通用](https://modelscope.cn/models/damo/cv_tinynas_object-detection_damoyolo_cigarette/summary) |
| **达摩院** | [人脸活体检测模型](https://modelscope.cn/models/damo/cv_manual_face-liveness_flrgb/summary) |
| **达摩院** | [人脸活体检测模型-IR](https://modelscope.cn/models/damo/cv_manual_face-liveness_flir/summary) |
| **达摩院** | [MGeo多任务多模态地址预训练底座-中文-base](https://modelscope.cn/models/damo/mgeo_backbone_chinese_base/summary) |
| **达摩院** | [MaSTS预训练模型-中文-搜索-CLUE语义匹配-large](https://modelscope.cn/models/damo/nlp_masts_backbone_clue_chinese-large/summary) |
| **达摩院** | [MaSTS文本相似度-中文-搜索-CLUE语义匹配-large](https://modelscope.cn/models/damo/nlp_masts_sentence-similarity_clue_chinese-large/summary) |
| **达摩院** | [NestedNER命名实体识别-中文-医疗领域-base](https://modelscope.cn/models/damo/nlp_nested-ner_named-entity-recognition_chinese-base-med/summary) |
| **达摩院** | [CoROM文本向量-中文-电商领域-base](https://modelscope.cn/models/damo/nlp_corom_sentence-embedding_chinese-base-ecom/summary) |
| **达摩院** | [CoROM语义相关性-中文-电商领域-base](https://modelscope.cn/models/damo/nlp_corom_passage-ranking_chinese-base-ecom/summary) |
| **达摩院** | [全任务零样本学习-mT5分类增强版-中文-base](https://www.modelscope.cn/models/damo/nlp_mt5_zero-shot-augment_chinese-base/summary) |
| **达摩院** | [StructBERT情绪分类-中文-七分类-base](https://www.modelscope.cn/models/damo/nlp_structbert_emotion-classification_chinese-base/summary) |
| **达摩院** | [HiTransUSE用户满意度估计-中文-电商-base](https://www.modelscope.cn/models/damo/nlp_user-satisfaction-estimation_chinese/summary) |
| **达摩院** | [UniASR语音识别-中文-通用-8k-实时-pytorch](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary) |
| **达摩院** | [Paraformer语音识别-中文-通用-16k-离线-large-pytorch](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |
| **达摩院** | [Paraformer语音识别-中文-通用-16k-离线-large-长音频版](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) |
| **达摩院** | [RaNER-chunking-英文-large](https://modelscope.cn/models/damo/nlp_raner_chunking_english-large/summary) |
| **达摩院** | [mPLUG-HiTeA-视频问答模型-英文-Base](https://www.modelscope.cn/models/damo/multi-modal_hitea_video-question-answering_base_en/summary) |
| **达摩院** | [mPLUG-HiTeA-视频描述-英文-Base](https://www.modelscope.cn/models/damo/multi-modal_hitea_video-captioning_base_en/summary) |
| **达摩院** | [Mask2Former-R50全景分割](https://modelscope.cn/models/damo/cv_r50_panoptic-segmentation_cocopan/summary) |
| **达摩院** | [图像人脸融合](https://modelscope.cn/models/damo/cv_unet-image-face-fusion_damo/summary) |
| **达摩院** | [春联生成模型-中文-base](https://modelscope.cn/models/damo/spring_couplet_generation/summary) |
| **达摩院** | [GPT-3夸夸机器人-中文-large](https://modelscope.cn/models/damo/nlp_gpt3_kuakua-robot_chinese-large/summary) |
| **达摩院** | [BART文本纠错-中文-法律领域-large](https://modelscope.cn/models/damo/nlp_bart_text-error-correction_chinese-law/summary) |
English Version
Highlight
- Add finetune support for DAMO-YOLO
- Add new real-time mask detection model, real-time helmet detection model, real-time human body detection model, real-time cigarette detection model
- Add finetune support asr, tts and kws model
- Batch inference support for nlp and ofa based multi-modal tasks
- Add high-resolution gpen model for face restoration
- Add DDColor model for image colorization
- Add VFI-RAFT model for video frame interpolation
- Add DUT-RAFT model for video stabilization
- Add RealBasicVSR model for video-super-resolution
Breaking changes
- change output of task text to image to list of images
- change output of task tts from output_pcm to output_wav
Feature
- Add easyrobust-models for image classification
- Video depth estimation support cpu mode
- asr pipeline add output_dir parame
- Add RTS face recognition ood model
- Add image-defrcn-fewshot-detection
- Add hires gpen model
- Add mgeo finetune and pipeline
- Add asr finetune & change inference
- Add quadtree image matching pipeline
- Add finetune for DAMO-YOLO
- Add FLRGB Face Liveness RGB Model
- Add speech separation finetune
- Asr inference: support new models, punctuation, vad, sv
- Add vop retrieval
- Add NAFNet Image Deblurring pipeline and finetune support
- Add megatron bert
- Add panovit-layout-estimation-pipeline
- Add vision middleware
- Add panorama_depth_estimation
- Unify token classfication model output
- Faq support finetune and multilingual
- Support stable diffusion and add DAMO chinese stable diffusion model
- Add cv-bnext-image-classification-pipeline
- Add VFI-RAFT model for video frame interpolation
- Add face changing pipeline
- Add DUT-RAFT model for video stabilization
- Update token_cls default sequence_length: 128 -> 512
- Add structure tasks for ofa: sudoku & text2sql
- Add new ASR model speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline and speech_UniASR_asr_2pass-pt-16k-com
- Add model for multiple object tracking in video
- Add ConvNeXt model
- Add ppl metric
- Add image colorization
- Add User satisfaction estimation pipeline
- OFA finetune support configuration file
- Add vldoc model
- Add space-t trainer for finetune
- Add speech separation pipeline
- Add cv_casmvs_multi-view-depth-esimation_general
- Add finetune support for mask2former
- Add domain specific object detection models
- Add maskdino model
- Add FLIR Face Liveness Model
- Add finetune support for kws nearfield
- Add cv_pointnet2_sceneflow-estimation_general
- Add HiTeA model for VideoQA and Caption
- Add GPT-2 model
- Add real-time human detection model
- Add video depth estimation pipeline
- Add ocr-detection-vlpt-pipeline
- Add RealBasicVSR model for video-super-resolution
- Add image skychange model
- Add support for cv_rdevos_video-object-segmentation
- Support kantts infer and finetune
- Add gpt-moe model
- Add hand detection
Improvements
- Text-error-correction support batch inference
- Add beam search and pair finetune for GPT-3
- Optimize ast_index logic
- Refactor msdataset modules
- Save a video with h264 vcodec for video_super_resolution
- Enhance interface standard and refactor card_detection, face detection, tinynas object detection and image classification pipeline
- Audio pipeline Support byte input feature and refine fp implementations
- Remove opencv-python from framework requirements and remove easynlp from nlp default requirements
- GPT-3 model supports batch input
- Batch inference for all ofa models
- AST scanner prebuilt in whl to speed up import process
BugFix
- Fix best ckpt saver not actually save best ckpt error
- Fix logger file hanlder problem
- Fix missing self. (61)
- Fix: hub test suites can not parallel
- Fix loading custom cv data error (59)
- Fix saved checkpoint can't run with pipeline for gpt3
- Fix check video type cv2.VideoCapture and add unittest
- Fix a bug for plug inference
- Fix memory leak bug in eval for movie scene segmentation
- Fix useragent string and a trainer invokedby
- Fix: statistics header not correct set
- Fix timeout issue for uni-fold list_oss_objects api
- Fix card-detection model unregistered error and fix log warning
- Fix multimer input for science/protein_structure
- Fix demo service && copy license for cv/language_guided_video_summarization
- Fix file directory create error in ddp training