Evalscope

Latest version: v0.11.0

Safety actively analyzes 707299 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 3

0.5.5

Release Notes

1. Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations https://github.com/modelscope/evalscope/pull/146
- Added cmb dataset https://github.com/modelscope/evalscope/pull/117

2. Support for `LongBench-write` quality evaluation of long text generation https://github.com/modelscope/evalscope/pull/136

3. Automatic downloading of `punkt_tab.zip` from `nltk` https://github.com/modelscope/evalscope/pull/140

4. Support for RAG evaluation https://github.com/modelscope/evalscope/pull/127:
- Support for embeddings/reranker evaluation: Integration of `MTEB` (Massive Text Embedding Benchmark) and `CMTEB` (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the `ragas` framework, supporting automatic generation of evaluation datasets and evaluation based on judge models

5. Documentation Updates:
- Added "Blog" section https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- Added support for dataset page https://github.com/modelscope/evalscope/pull/121
- Updated function usage instructions https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

6. Updated dependencies: `nltk>=3.9` and `rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

中文说明

1. 新增数据集支持:
- 完善多模态评测功能,支持MMBench-Video,Video-MME,MVBench视频评测 https://github.com/modelscope/evalscope/pull/146
- 新增cmb数据集 https://github.com/modelscope/evalscope/pull/117

2. 支持`LongBench-write` 长文本生成的质量评测 https://github.com/modelscope/evalscope/pull/136

3. 支持从`nltk`自动下载 `punkt_tab.zip` https://github.com/modelscope/evalscope/pull/140
3. 支持RAG评测:https://github.com/modelscope/evalscope/pull/127
- 支持embeddings/reranker 评测:集成`MTEB`(Massive Text Embedding Benchmark)和 `CMTEB`(Chinese Massive Text Embedding Benchmark),支持检索、重排等任务评估
- 支持RAG端到端评测:集成`ragas`框架,支持自动生成评测数据集和基于裁判员模型的评测

4. 文档更新
- 增加 “博客” 板块 https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- 增加支持的数据集页面 https://github.com/modelscope/evalscope/pull/121
- 更新功能使用说明 https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

5. 更新依赖`nltk>=3.9`和`rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

0.5.2

Highlight features
- Support Multi-modal models evaluation (VLM Eval)
- Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
- Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`


Breaking Changes
None


What's Changed
1. Support Multi-modal models evaluation (VLM Eval)
2. Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
3. Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`
4. Update README
5. Add UT cases for VLM eval
6. Update examples for `OpenCompass` and `VLMEval` eval backends
7. Update version restrictions for ms-opencompass and ms-vlmeval dependencies.

0.4.3

1. Support async client infer for OpenAI API format evaluation
2. Support mulati-modal evaluation with VLMEvalKit as a eval-backend
3. Refactor setup, support pip install llmuses[opencompass], pip install llmuses[vlmeval], pip install llmuses[all]
4. Fix some bugs

0.2.8

1. Fix local dir eval
2. Add fuzzy match for templates
3. Add local models with templates

0.2.6

1. Support loading cmmlu from local disk

0.2.5

Page 3 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.