Release Notes
1. Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations https://github.com/modelscope/evalscope/pull/146
- Added cmb dataset https://github.com/modelscope/evalscope/pull/117
2. Support for `LongBench-write` quality evaluation of long text generation https://github.com/modelscope/evalscope/pull/136
3. Automatic downloading of `punkt_tab.zip` from `nltk` https://github.com/modelscope/evalscope/pull/140
4. Support for RAG evaluation https://github.com/modelscope/evalscope/pull/127:
- Support for embeddings/reranker evaluation: Integration of `MTEB` (Massive Text Embedding Benchmark) and `CMTEB` (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the `ragas` framework, supporting automatic generation of evaluation datasets and evaluation based on judge models
5. Documentation Updates:
- Added "Blog" section https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- Added support for dataset page https://github.com/modelscope/evalscope/pull/121
- Updated function usage instructions https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127
6. Updated dependencies: `nltk>=3.9` and `rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143
中文说明
1. 新增数据集支持:
- 完善多模态评测功能,支持MMBench-Video,Video-MME,MVBench视频评测 https://github.com/modelscope/evalscope/pull/146
- 新增cmb数据集 https://github.com/modelscope/evalscope/pull/117
2. 支持`LongBench-write` 长文本生成的质量评测 https://github.com/modelscope/evalscope/pull/136
3. 支持从`nltk`自动下载 `punkt_tab.zip` https://github.com/modelscope/evalscope/pull/140
3. 支持RAG评测:https://github.com/modelscope/evalscope/pull/127
- 支持embeddings/reranker 评测:集成`MTEB`(Massive Text Embedding Benchmark)和 `CMTEB`(Chinese Massive Text Embedding Benchmark),支持检索、重排等任务评估
- 支持RAG端到端评测:集成`ragas`框架,支持自动生成评测数据集和基于裁判员模型的评测
4. 文档更新
- 增加 “博客” 板块 https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- 增加支持的数据集页面 https://github.com/modelscope/evalscope/pull/121
- 更新功能使用说明 https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127
5. 更新依赖`nltk>=3.9`和`rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143