Funasr

Latest version: v1.1.17

Safety actively analyzes 687881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

2023.3.17

- New Features:
- Added support for GPU runtime solution, [nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu), which allows easy export of Paraformer models from ModelScope and deployment as services. We conducted benchmark tests on a single GPU-V100, and achieved an RTF of 0.0032 and a speedup of 300.
- Added support for CPU runtime [quantization solution](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export), which supports export of quantized ONNX and Libtorch models from ModelScope. We conducted [benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python) tests on a CPU-8369B, and found that RTF increased by 50% (0.00438 -> 0.00226) and double speedup (228 -> 442).
- Added support for C++ version of the gRPC service deployment solution. The C++ version of ONNXRuntime and quantization solution, provides double higher efficiency compared to the Python runtime, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc).
- Added streaming inference pipeline to the [16k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), [8k VAD model](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary), with support for audio input streams (>= 10ms) , [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/236).
- Improved the [punctuation prediction model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), resulting in increased accuracy (F-score increased from 55.6 to 56.5).
- Added real-time subtitle example based on gRPC service, using a 2-pass recognition model. [Paraformer streaming](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) model is used to output text in real time, while [Paraformer-large offline model](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) is used to correct recognition results, [demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc).
- New Models:
- Added [16k Paraformer streaming model](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary), which supports real-time speech recognition with streaming audio input, [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/241). It can be deployed using the gRPC service to implement real-time subtitle function.
- Added [streaming punctuation model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary), which supports real-time punctuation marking in streaming speech recognition scenarios, with real-time calls based on VAD points. It can be used along with real-time ASR models to achieve readable real-time subtitle function, [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/238).
- Added [TP-Aligner timestamp model](https://www.modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary), which takes audio and corresponding text as input and outputs word-level timestamps. Its performance is comparable to that of the Kaldi FA model (60.3ms vs. 69.3ms). It can be combined freely with ASR models, [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/246).
- Added financial domain model ([8k Paraformer-large-3445vocab](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-8k-finance-vocab3445/summary)), which is fine-tuned using 1000 hours of data. The recognition accuracy on the financial domain test set increased by 5%, and the recall rate of domain keywords increased by 7%.
- Added audio-visual domain model ([16k Paraformer-large-3445vocab](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-audio_and_video-vocab3445/summary)), which is fine-tuned using 10,000 hours of data. The recognition accuracy on the audio-visual domain test set increased by 8%.
- Added [8k speaker verification model](https://www.modelscope.cn/models/damo/speech_xvector_sv-en-us-callhome-8k-spk6135-pytorch/summary), which can be used for speaker embedding extraction.
- Added speaker diarization models, including [16k SOND Chinese model](https://www.modelscope.cn/models/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/summary), [8k SOND English model](https://www.modelscope.cn/models/damo/speech_diarization_sond-en-us-callhome-8k-n16k4-pytorch/summary), which achieved the best performance on AliMeeting and Callhome with a DER of 4.46% and 11.13%, respectively.
- Added UniASR streaming offline unifying models, including [16k UniASR Burmese](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary), [16k UniASR Hebrew](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary), [16k UniASR Urdu](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary), [8k UniASR Mandarin financial domain](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-finance-vocab3445-online/summary), and [16k UniASR Mandarin audio-visual domain](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-audio_and_video-vocab3445-online/summary).

最新更新:

- 2023年3月17日:[funasr-0.3.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.4.1
- 功能完善:
- 新增GPU runtime方案,[nv-triton](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu),可以将modelscope中Paraformer模型便捷导出,并部署成triton服务,实测,单GPU-V100,RTF为0.0032,吞吐率为300,[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/triton_gpu#performance-benchmark)。
- 新增CPU [runtime量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),支持从modelscope导出量化版本onnx与libtorch,实测,CPU-8369B,量化后,RTF提升50%(0.00438->0.00226),吞吐率翻倍(228->442),[benchmark](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python)。
- [新增加C++版本grpc服务部署方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/grpc),配合C++版本[onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/onnxruntime),以及[量化方案](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),相比python-runtime性能翻倍。
- [16k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),[8k VAD模型](https://www.modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary),modelscope pipeline,新增加流式推理方式,,最小支持10ms语音输入流,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/236)。
- 优化[标点预测模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),主观体验标点准确性提升(fscore绝对提升 55.6->56.5)。
- 基于grpc服务,新增实时字幕[demo](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),采用2pass识别模型,[Paraformer流式模型](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary) 用来上屏,[Paraformer-large离线模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)用来纠正识别结果。
- 上线新模型:
- [16k Paraformer流式模型](https://www.modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8404-online/summary),支持语音流输入,可以进行实时语音识别,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/241)。支持基于grpc服务进行部署,可实现实时字幕功能。
- [流式标点模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727/summary),支持流式语音识别场景中的标点打标,以VAD点为实时调用点进行流式调用。可与实时ASR模型配合使用,实现具有可读性的实时字幕功能,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/238)
- [TP-Aligner时间戳模型](https://www.modelscope.cn/models/damo/speech_timestamp_prediction-v1-16k-offline/summary),输入音频及对应文本输出字级别时间戳,效果与Kaldi FA模型相当(60.3ms v.s. 69.3ms),支持与asr模型自由组合,[用法](https://github.com/alibaba-damo-academy/FunASR/discussions/246)。
- 金融领域模型,[8k Paraformer-large-3445vocab](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-8k-finance-vocab3445/summary),使用1000小时数据微调训练,金融领域测试集识别效果相对提升5%,领域关键词召回相对提升7%。
- 音视频领域模型,[16k Paraformer-large-3445vocab](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-audio_and_video-vocab3445/summary),使用10000小时数据微调训练,音视频领域测试集识别效果相对提升8%。
- [8k说话人确认模型](https://www.modelscope.cn/models/damo/speech_xvector_sv-en-us-callhome-8k-spk6135-pytorch/summary),CallHome数据集英文说话人确认模型,也可用于声纹特征提取。
- 说话人日志模型,[16k SOND中文模型](https://www.modelscope.cn/models/damo/speech_diarization_sond-zh-cn-alimeeting-16k-n16k4-pytorch/summary),[8k SOND英文模型](https://www.modelscope.cn/models/damo/speech_diarization_sond-en-us-callhome-8k-n16k4-pytorch/summary),在AliMeeting和Callhome上获得最优性能,DER分别为4.46%和11.13%。
- UniASR流式离线一体化模型:
[16k UniASR缅甸语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-my-16k-common-vocab696-pytorch/summary)、 [16k UniASR希伯来语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-he-16k-common-vocab1085-pytorch/summary)、 [16k UniASR乌尔都语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ur-16k-common-vocab877-pytorch/summary)、 [8k UniASR中文金融领域](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-finance-vocab3445-online/summary)、[16k UniASR中文音视频领域](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-audio_and_video-vocab3445-online/summary)。


New Contributors
* dingbig made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/147
* yuekaizhang made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/161
* zhuzizyf made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/180
* znsoftm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/185
* songtaoshi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/227

**Full Changelog**: https://github.com/alibaba-damo-academy/FunASR/compare/v0.2.0...v0.3.0

2023.2.17

- We support a new feature, export paraformer models into [onnx and torchscripts](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export) from modelscope. The local finetuned models are also supported.
- We support a new feature, [onnxruntime](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python), you could deploy the runtime without modelscope or funasr, for the [paraformer-large](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary) model, the rtf of onnxruntime is 3x speedup(0.110->0.038) on cpu, [details](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/onnxruntime/paraformer/rapid_paraformer#speed).
- We support a new feature, [grpc](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc), you could build the ASR service with grpc, by deploying the modelscope pipeline or onnxruntime.
- We release a new model [paraformer-large-contextual](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary), which supports the hotword customization based on the incentive enhancement, and improves the recall and precision of hotwords.
- We optimize the timestamp alignment of [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), the prediction accuracy of timestamp is much improved, and achieving accumulated average shift (aas) of 74.7ms, [details](https://arxiv.org/abs/2301.12343).
- We release a new model, [8k VAD model](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [modelscope](https://github.com/alibaba-damo-academy/FunASR/discussions/134).
- We release a new model, [MFCCA](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary), a multi-channel multi-speaker model which is independent of the number and geometry of microphones and supports Mandarin meeting transcription.
- We release several new UniASR model: [Southern Fujian Dialect model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary), [French model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary), [German model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary), [Vietnamese model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary), [Persian model](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary).
- We release a new model, [paraformer-data2vec model](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/summary), an unsupervised pretraining model on AISHELL-2, which is inited for paraformer model and then finetune on AISHEL-1.
- We release a new feature, the `VAD`, `ASR` and `PUNC` models could be integrated freely, which could be models from [modelscope](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), or the local finetine models. The [demo](https://github.com/alibaba-damo-academy/FunASR/discussions/134).
- We optimize [punctuation common model](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), enhance the recall and precision, fix the badcases of missing punctuation marks.
- Various new types of audio input types are now supported by modelscope inference pipeline, including: mp3、flac、ogg、opus...

最新更新:

- 2023年2月(2月17号发布):[funasr-0.2.0](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.3.0
- 功能完善:
- 新增加模型导出功能,Modelscope中所有Paraformer模型与本地finetune模型,支持一键导出[onnx格式模型与torchscripts格式模型](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/export),用于模型部署。
- 新增加Paraformer模型[onnxruntime部署功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python),无须安装Modelscope与FunASR,即可部署,cpu实测,onnxruntime推理速度提升近3倍(rtf: 0.110->0.038)。
- 新增加[grpc服务功能](https://github.com/alibaba-damo-academy/FunASR/tree/main/funasr/runtime/python/grpc),支持对Modelscope推理pipeline进行服务部署,也支持对onnxruntime进行服务部署。
- 优化[Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary)时间戳,对badcase时间戳预测准确率有较大幅度提升,平均首尾时间戳偏移74.7ms,[详见论文](https://arxiv.org/abs/2301.12343)。
- 新增加任意VAD模型、ASR模型与标点模型自由组合功能,可以自由组合Modelscope中任意模型以及本地finetune后的模型进行推理,[用法示例](https://github.com/alibaba-damo-academy/FunASR/discussions/134)。
- 优化[标点通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),增加标点召回和精度,修复缺少标点等问题。
- 新增加采样率自适应功能,任意输入采样率音频会自动匹配到模型采样率;新增加多种语音格式支持,如,mp3、flac、ogg、opus等。

- 上线新模型:
- [Paraformer-large热词模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404/summary),可实现热词定制化,基于提供的热词列表,对热词进行激励增强,提升模型对热词的召回。
- [MFCCA多通道多说话人识别模型](https://www.modelscope.cn/models/NPU-ASLP/speech_mfcca_asr-zh-cn-16k-alimeeting-vocab4950/summary),与西工大音频语音与语言处理研究组合作论文,一种基于多帧跨通道注意力机制的多通道语音识别模型。
- [8k语音端点检测VAD模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-8k-common/summary),可用于检测长语音片段中有效语音的起止时间点,支持流式输入,最小支持10ms语音输入流。
- UniASR流式离线一体化模型: [16k UniASR闽南语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-minnan-16k-common-vocab3825/summary)、 [16k UniASR法语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fr-16k-common-vocab3472-tensorflow1-online/summary)、 [16k UniASR德语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-de-16k-common-vocab3690-tensorflow1-online/summary)、 [16k UniASR越南语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-vi-16k-common-vocab1001-pytorch-online/summary)、 [16k UniASR波斯语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-fa-16k-common-vocab1257-pytorch-online/summary)。
- [基于Data2vec结构无监督预训练Paraformer模型](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-paraformer-zh-cn-aishell2-16k/summary),采用Data2vec无监督预训练初值模型,在AISHELL-1数据中finetune Paraformer模型。

New Contributors
* zjc6666 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/35
* lyblsgo made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/37
* lingyunfly made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/42
* fangd123 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/44
* dyyzhmm made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/48
* R1ckShi made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/50
* chenmengzheAAA made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/57
* ZhihaoDU made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/95
* SWHL made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/97
* yufan-aslp made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/105
* magicharry made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/119

**Full Changelog**: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.6...v0.2.0

2023.1.16

- We release a new version model [Paraformer-large-long](https://modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary), which integrate the [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary) model, [ASR](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),
[Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary) model and timestamp together. The model could take in several hours long inputs.
- We release a new type model, [VAD](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary), which could predict the duration of none-silence speech. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
- We release a new type model, [Punctuation](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary), which could predict the punctuation of ASR models's results. It could be freely integrated with any ASR models in [Model Zoo](docs/modelscope_models.md).
- We release a new model, [Data2vec](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary), an unsupervised pretraining model which could be finetuned on ASR and other downstream tasks.
- We release a new model, [Paraformer-Tiny](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary), a lightweight Paraformer model which supports Mandarin command words recognition.
- We release a new type model, [SV](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary), which could extract speaker embeddings and further perform speaker verification on paired utterances. It will be supported for speaker diarization in the future version.
- We improve the pipeline of modelscope to speedup the inference, by integrating the process of build model into build pipeline.
- Various new types of audio input types are now supported by modelscope inference pipeline, including wav.scp, wav format, audio bytes, wave samples...

最新更新
- 2023年1月(1月16号发布):[funasr-0.1.6](https://github.com/alibaba-damo-academy/FunASR/tree/main), modelscope-1.2.0
- 上线新模型:
- [Paraformer-large长音频模型](https://www.modelscope.cn/models/damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-pytorch/summary),集成VAD、ASR、标点与时间戳功能,可直接对时长为数小时音频进行识别,并输出带标点文字与时间戳。
- [中文无监督预训练Data2vec模型](https://www.modelscope.cn/models/damo/speech_data2vec_pretrain-zh-cn-aishell2-16k-pytorch/summary),采用Data2vec结构,基于AISHELL-2数据的中文无监督预训练模型,支持ASR或者下游任务微调模型。
- [16k语音端点检测VAD模型](https://modelscope.cn/models/damo/speech_fsmn_vad_zh-cn-16k-common-pytorch/summary),可用于检测长语音片段中有效语音的起止时间点。
- [中文标点预测通用模型](https://www.modelscope.cn/models/damo/punc_ct-transformer_zh-cn-common-vocab272727-pytorch/summary),可用于语音识别模型输出文本的标点预测。
- [8K UniASR流式模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-online/summary),[8K UniASR模型](https://www.modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab3445-pytorch-offline/summary),一种流式与离线一体化语音识别模型,进行流式语音识别的同时,能够以较低延时输出离线识别结果来纠正预测文本。
- Paraformer-large基于[AISHELL-1微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell1-vocab8404-pytorch/summary)、[AISHELL-2微调模型](https://www.modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-aishell2-vocab8404-pytorch/summary),将Paraformer-large模型分别基于AISHELL-1与AISHELL-2数据微调。
- [说话人确认模型](https://www.modelscope.cn/models/damo/speech_xvector_sv-zh-cn-cnceleb-16k-spk3465-pytorch/summary) ,可用于说话人确认,也可以用来做说话人特征提取。
- [小尺寸设备端Paraformer指令词模型](https://www.modelscope.cn/models/damo/speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch/summary),Paraformer-tiny指令词版本,使用小参数量模型支持指令词识别。
- 将原TensorFlow模型升级为Pytorch模型,进行推理,并支持微调定制,包括:
- 16K 模型:[Paraformer中文](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)、[Paraformer-large中文](https://modelscope.cn/models/damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8358-tensorflow1/summary)、[UniASR中文](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary)、[UniASR-large中文](https://modelscope.cn/models/damo/speech_UniASR-large_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-offline/summary)、[UniASR中文流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online/summary)、[UniASR方言](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-offline/summary)、[UniASR方言流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cn-dialect-16k-vocab8358-tensorflow1-online/summary)、[UniASR日语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-offline/summary)、[UniASR日语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ja-16k-common-vocab93-tensorflow1-online/summary)、[UniASR印尼语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-offline/summary)、[UniASR印尼语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-id-16k-common-vocab1067-tensorflow1-online/summary)、[UniASR葡萄牙语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-offline/summary)、[UniASR葡萄牙语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-pt-16k-common-vocab1617-tensorflow1-online/summary)、[UniASR英文](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-offline/summary)、[UniASR英文流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-en-16k-common-vocab1080-tensorflow1-online/summary)、[UniASR俄语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/summary)、[UniASR俄语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-online/summary)、[UniASR韩语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-offline/summary)、[UniASR韩语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ko-16k-common-vocab6400-tensorflow1-online/summary)、[UniASR西班牙语](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-offline/summary)、[UniASR西班牙语流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-es-16k-common-vocab3445-tensorflow1-online/summary)、[UniASR粤语简体](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-offline/files)、[UniASR粤语简体流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-cantonese-CHS-16k-common-vocab1468-tensorflow1-online/files)、
- 8K 模型:[Paraformer中文](https://modelscope.cn/models/damo/speech_paraformer_asr_nat-zh-cn-8k-common-vocab8358-tensorflow1/summary)、[UniASR中文](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/summary)、[UniASR中文流式模型](https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-zh-cn-8k-common-vocab8358-tensorflow1-offline/summary)


New Contributors
* nichongjia-2007 made their first contribution in https://github.com/alibaba-damo-academy/FunASR/pull/27

**Full Changelog**: https://github.com/alibaba-damo-academy/FunASR/compare/v0.1.4...v0.1.6

0.3.0

What's new:

0.2.0

What's new:

0.1.6

Release Notes:

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.