Evalscope

Latest version: v0.11.0

Safety actively analyzes 707299 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 3

0.5.5

Release Notes

1. Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations https://github.com/modelscope/evalscope/pull/146
- Added cmb dataset https://github.com/modelscope/evalscope/pull/117

2. Support for `LongBench-write` quality evaluation of long text generation https://github.com/modelscope/evalscope/pull/136

3. Automatic downloading of `punkt_tab.zip` from `nltk` https://github.com/modelscope/evalscope/pull/140

4. Support for RAG evaluation https://github.com/modelscope/evalscope/pull/127:
- Support for embeddings/reranker evaluation: Integration of `MTEB` (Massive Text Embedding Benchmark) and `CMTEB` (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the `ragas` framework, supporting automatic generation of evaluation datasets and evaluation based on judge models

5. Documentation Updates:
- Added "Blog" section https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- Added support for dataset page https://github.com/modelscope/evalscope/pull/121
- Updated function usage instructions https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

6. Updated dependencies: `nltk>=3.9` and `rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

中文说明

1. 新增数据集支持：
- 完善多模态评测功能，支持MMBench-Video，Video-MME，MVBench视频评测 https://github.com/modelscope/evalscope/pull/146
- 新增cmb数据集 https://github.com/modelscope/evalscope/pull/117

2. 支持`LongBench-write` 长文本生成的质量评测 https://github.com/modelscope/evalscope/pull/136

3. 支持从`nltk`自动下载 `punkt_tab.zip` https://github.com/modelscope/evalscope/pull/140
3. 支持RAG评测：https://github.com/modelscope/evalscope/pull/127
- 支持embeddings/reranker 评测：集成`MTEB`（Massive Text Embedding Benchmark）和 `CMTEB`（Chinese Massive Text Embedding Benchmark），支持检索、重排等任务评估
- 支持RAG端到端评测：集成`ragas`框架，支持自动生成评测数据集和基于裁判员模型的评测

4. 文档更新
- 增加 “博客” 板块 https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- 增加支持的数据集页面 https://github.com/modelscope/evalscope/pull/121
- 更新功能使用说明 https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

5. 更新依赖`nltk>=3.9`和`rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

0.5.2

Highlight features
- Support Multi-modal models evaluation (VLM Eval)
- Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
- Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`

Breaking Changes
None

What's Changed
1. Support Multi-modal models evaluation (VLM Eval)
2. Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
3. Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`
4. Update README
5. Add UT cases for VLM eval
6. Update examples for `OpenCompass` and `VLMEval` eval backends
7. Update version restrictions for ms-opencompass and ms-vlmeval dependencies.

0.4.3

1. Support async client infer for OpenAI API format evaluation
2. Support mulati-modal evaluation with VLMEvalKit as a eval-backend
3. Refactor setup, support pip install llmuses[opencompass], pip install llmuses[vlmeval], pip install llmuses[all]
4. Fix some bugs

0.2.8

1. Fix local dir eval
2. Add fuzzy match for templates
3. Add local models with templates

Evalscope

Page 3 of 3

0.5.5

0.5.2

0.4.3

0.2.8

0.2.6

0.2.5

Page 3 of 3

Links

Releases