Evalscope

Latest version: v0.13.0

Safety actively analyzes 714919 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.9.0

What's Changed
253
- Support for specifying model service API URL for evaluation: Evaluation can be performed on both local and remote model services.
- Support for custom schema for mixed data evaluation: Combine different datasets for a more comprehensive assessment of model -capabilities with less data.
- Add benchmark contribution guidelines: Users can add their own benchmarks to make the tool more powerful and beneficial for more people.

中文
253
- 支持指定模型服务API URL评测:不论是本地模型还是远端模型服务都可以评测
- 支持自定义schema进行数据混合评测:混合不同的数据集,用更少的数据,更全面的评估模型能力
- 添加benchmark贡献指南:可以自行添加benchmark,让工具变的更强大,让更多人受益


**Full Changelog**: https://github.com/modelscope/evalscope/compare/v0.8.2...v0.9.0

0.8.2

What's Changed
* add user group by Yunnglin in https://github.com/modelscope/evalscope/pull/251
* fix perf seed by Yunnglin in https://github.com/modelscope/evalscope/pull/254
* add spawn env by Yunnglin in https://github.com/modelscope/evalscope/pull/256
* Fix: sglang API response does not contain 'object' field. by tghfly in https://github.com/modelscope/evalscope/pull/260
* fix parse response by Yunnglin in https://github.com/modelscope/evalscope/pull/262
* fix predict by Yunnglin in https://github.com/modelscope/evalscope/pull/264
* compat ragas 0.2.9 and remove chinese prompt cache by Yunnglin in https://github.com/modelscope/evalscope/pull/265

New Contributors
* tghfly made their first contribution in https://github.com/modelscope/evalscope/pull/260

**Full Changelog**: https://github.com/modelscope/evalscope/compare/v0.8.1...v0.8.2

0.8.1

What's Changed
* Unify `opencompass` and `vlmeval` output dirs by Yunnglin in https://github.com/modelscope/evalscope/pull/242
* Perf add more metrics by Yunnglin in https://github.com/modelscope/evalscope/pull/245
* Perf add `trust remote` parameter by Yunnglin in https://github.com/modelscope/evalscope/pull/246
* Compat ms-swift<3.0 by Yunnglin in https://github.com/modelscope/evalscope/pull/249
* Fix humaneval for native eval by Yunnglin in https://github.com/modelscope/evalscope/pull/248

中文版本
* 统一 `opencompass` 和 `vlmeval` 输出目录,作者:Yunnglin,相关链接:https://github.com/modelscope/evalscope/pull/242
* 模型压测:增加更多指标,作者:Yunnglin,相关链接:https://github.com/modelscope/evalscope/pull/245
* 模型压测:添加`trust remote`参数,作者:Yunnglin,相关链接:https://github.com/modelscope/evalscope/pull/246
* 兼容 ms-swift<3.0,作者:Yunnglin,相关链接:https://github.com/modelscope/evalscope/pull/249
* 修复本地评估的 humaneval 问题,作者:Yunnglin,相关链接:https://github.com/modelscope/evalscope/pull/248

**Full Changelog**: https://github.com/modelscope/evalscope/compare/v0.8.0...v0.8.1

0.8.0

Release Notes

1. Optimize `Native` eval and remove template_type 231
2. The evalscope perf command supports the --outputs-dir configuration. 232
3. Support ragas 0.2.7 234



Bug Fixes

1. Fix longwriter docs 239
2. Fix lint for longwriter 240
3. Fix lint 237
4. Unify perf output 238


Documentation Updates

1. Fix longwriter docs 239
2. Optimize `Native` eval and remove template_type 231



中文说明

特性

1. 取消`Native`模式评测中template_type参数 231
2. perf模块支持--output-dir 232
3. 支持适配最新的ragas 0.2.7版本 234


缺陷修复

1. 修复longwriter代码示例,优化流程 239
2. 修复lint,以及longwriter的lint 240 237


文档更新

1. 更新longwriter文档 239
2. 更新`Native`评测模式的相关文档 231

0.7.2

Release Note
1. Remove `pyarrow` version requirement 225
2. Optimize warning info 223


中文说明
1. 移除 `pyarrow` 版本要求 225
2. 优化 warning 信息 223

0.7.1

Release Notes

1. Add PMMEval benchmark 222


中文说明

特性

1. 增加PMMEval评测集 222

Page 2 of 4

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.