Evalscope

Latest version: v0.6.0

Safety actively analyzes 681812 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.6.0

Release Notes

1. Support multi-modal RAG evaluation 149
- Add CLIP_Benchmark
- Add end-to-end multi-modal RAG evaluation in Ragas
2. To be compatible with Ragas v0.2.3 165 171
3. Support truncating input for CLIP models 163 164
4. Support saving knowledge graphs when generating datasets in Ragas 175



Bug Fixes

1. Fix issue of abnormal metrics during CMTEB evaluation 157
2. Fix issue of GenerationConfig being None 173
3. Update datasets version constraints 184
4. Add publish workflow 186


Documentation Updates

1. Update VLMEvalKit documentation 166
2. Update multi-modal RAG blog 172



中文说明

特性

1. 添加多模态RAG评测支持 149
- 支持CLIP_Benchmark
- 支持Ragas端到端多模态RAG评测
2. 兼容Ragas v0.2.3 165 171
3. 支持CLIP模型截断输入 163 164
4. 支持Ragas生成数据集时保存知识图谱 175


缺陷修复

1. 修复CMTEB评估时指标异常的问题 157
2. 修复GenerationConfig为None的异常 173
3. 更新datasets版本限制 184
4. 增加publish workflow 186


文档更新

1. 更新VLMEvalKit文档 166
2. 更新多模态RAG博客 172

0.5.5

Release Notes

1. Added Dataset Support:
- Enhanced multimodal evaluation capabilities, now supporting MMBench-Video, Video-MME, and MVBench video evaluations https://github.com/modelscope/evalscope/pull/146
- Added cmb dataset https://github.com/modelscope/evalscope/pull/117

2. Support for `LongBench-write` quality evaluation of long text generation https://github.com/modelscope/evalscope/pull/136

3. Automatic downloading of `punkt_tab.zip` from `nltk` https://github.com/modelscope/evalscope/pull/140

4. Support for RAG evaluation https://github.com/modelscope/evalscope/pull/127:
- Support for embeddings/reranker evaluation: Integration of `MTEB` (Massive Text Embedding Benchmark) and `CMTEB` (Chinese Massive Text Embedding Benchmark), supporting tasks such as retrieval and reranking
- Support for end-to-end RAG evaluation: Integration of the `ragas` framework, supporting automatic generation of evaluation datasets and evaluation based on judge models

5. Documentation Updates:
- Added "Blog" section https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- Added support for dataset page https://github.com/modelscope/evalscope/pull/121
- Updated function usage instructions https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

6. Updated dependencies: `nltk>=3.9` and `rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

中文说明

1. 新增数据集支持:
- 完善多模态评测功能,支持MMBench-Video,Video-MME,MVBench视频评测 https://github.com/modelscope/evalscope/pull/146
- 新增cmb数据集 https://github.com/modelscope/evalscope/pull/117

2. 支持`LongBench-write` 长文本生成的质量评测 https://github.com/modelscope/evalscope/pull/136

3. 支持从`nltk`自动下载 `punkt_tab.zip` https://github.com/modelscope/evalscope/pull/140
3. 支持RAG评测:https://github.com/modelscope/evalscope/pull/127
- 支持embeddings/reranker 评测:集成`MTEB`(Massive Text Embedding Benchmark)和 `CMTEB`(Chinese Massive Text Embedding Benchmark),支持检索、重排等任务评估
- 支持RAG端到端评测:集成`ragas`框架,支持自动生成评测数据集和基于裁判员模型的评测

4. 文档更新
- 增加 “博客” 板块 https://github.com/modelscope/evalscope/pull/126, https://github.com/modelscope/evalscope/pull/135
- 增加支持的数据集页面 https://github.com/modelscope/evalscope/pull/121
- 更新功能使用说明 https://github.com/modelscope/evalscope/pull/125, https://github.com/modelscope/evalscope/pull/134, https://github.com/modelscope/evalscope/pull/138, https://github.com/modelscope/evalscope/pull/137, https://github.com/modelscope/evalscope/pull/127

5. 更新依赖`nltk>=3.9`和`rouge-score>=0.1.0` https://github.com/modelscope/evalscope/pull/145, https://github.com/modelscope/evalscope/pull/143

0.5.2

Highlight features
- Support Multi-modal models evaluation (VLM Eval)
- Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
- Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`


Breaking Changes
None


What's Changed
1. Support Multi-modal models evaluation (VLM Eval)
2. Transform the synchronous API to asynchronous for OpenAI's API format, speed up the evaluation process up to 10x .
3. Support installation with format: `pip install evalscope[opencompass]` or `pip install evalscope[vlmeval]`
4. Update README
5. Add UT cases for VLM eval
6. Update examples for `OpenCompass` and `VLMEval` eval backends
7. Update version restrictions for ms-opencompass and ms-vlmeval dependencies.

0.4.3

1. Support async client infer for OpenAI API format evaluation
2. Support mulati-modal evaluation with VLMEvalKit as a eval-backend
3. Refactor setup, support pip install llmuses[opencompass], pip install llmuses[vlmeval], pip install llmuses[all]
4. Fix some bugs

0.2.8

1. Fix local dir eval
2. Add fuzzy match for templates
3. Add local models with templates

0.2.6

1. Support loading cmmlu from local disk

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.