This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.
๐Highlights
**๐ Leaderboard**: The evaluation results of [Qwen-7B](https://github.com/QwenLM/Qwen-7B), [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B), [LLaMA-2](https://github.com/facebookresearch/llama), and GPT-4 has been posted to our [leaderboard](https://opencompass.org.cn/leaderboard-llm). Now it's also possible to conduct [model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) online. We hope this feature offers deeper insights!
**๐ Datasets**: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. ([101](https://github.com/InternLM/opencompass/pull/101), [#192](https://github.com/InternLM/opencompass/pull/192)) Add datasets related to safety to collections. [#185]
**๐ญNew modality**: Support for [MMBench](https://opencompass.org.cn/mmbench) is introduced, and the evaluation of multi-modal models is on the way! (#56 ,161) Besides, Intern language model is introduced. ([51](https://github.com/InternLM/opencompass/pull/51))
**โ๏ธEnhancement**: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [121] [128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [148] [187]
**๐ Documentation**: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.
**๐ ๏ธ Bug Fixes**: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [122] [137]
๐ New Contributors
Thank you to all our contributors for this release, with a special shoutout to our new contributors:
go-with-me000 ([First Contribution](https://github.com/InternLM/opencompass/pull/51))
anakin-skywalker-Joseph ([First Contribution](https://github.com/InternLM/opencompass/pull/125))
zhouzaida ([First Contribution](https://github.com/InternLM/opencompass/pull/152))
dependabot ([First Contribution](https://github.com/InternLM/opencompass/pull/178))
Changelog
* [Feat] add auto assignee bot by yingfhu in https://github.com/InternLM/opencompass/pull/105
* [Doc] Update Readme and Fix failed links by Ezra-Yu in https://github.com/InternLM/opencompass/pull/108
* Doc: add twitter link by vansin in https://github.com/InternLM/opencompass/pull/111
* Support intern lanuage model by go-with-me000 in https://github.com/InternLM/opencompass/pull/51
* [Docs] Update issue templates for proper guidance to discussions by gaotongxiao in https://github.com/InternLM/opencompass/pull/116
* [Feature] Allow explicitly setting the temperature for API model by kennymckormick in https://github.com/InternLM/opencompass/pull/121
* [Fix] Fix seed in HFEvaluator by kennymckormick in https://github.com/InternLM/opencompass/pull/122
* [Feature] Update SC by Leymore in https://github.com/InternLM/opencompass/pull/126
* ่ฏดๆๆๆกฃๆ ้ขไฟฎๆน by anakin-skywalker-Joseph in https://github.com/InternLM/opencompass/pull/125
* [Docs] Update prompt docs by Leymore in https://github.com/InternLM/opencompass/pull/46
* [Enhancement] Update README.md by tonysy in https://github.com/InternLM/opencompass/pull/119
* [DOC] Add metric doc by Ezra-Yu in https://github.com/InternLM/opencompass/pull/118
* [Feature] Evaluating acc based on minimum edit distance, update SIQA by gaotongxiao in https://github.com/InternLM/opencompass/pull/130
* [Feature] Several enhancements by gaotongxiao in https://github.com/InternLM/opencompass/pull/142
* [Doc] update acknowledgements by Leymore in https://github.com/InternLM/opencompass/pull/147
* Fix typo in readme by zhouzaida in https://github.com/InternLM/opencompass/pull/152
* [Feature]: Use multimodal by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/73
* [Refine] Refine PR 122 by kennymckormick in https://github.com/InternLM/opencompass/pull/123
* [Enhancement] Optimize OpenAI models by gaotongxiao in https://github.com/InternLM/opencompass/pull/128
* Update pre-commit ignore-word list by gaotongxiao in https://github.com/InternLM/opencompass/pull/162
* [Script] Add scripts to evaluate MMBench by kennymckormick in https://github.com/InternLM/opencompass/pull/161
* [Doc] Update Readme by tonysy in https://github.com/InternLM/opencompass/pull/165
* [Feature]: Add mm suport for local runner by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/169
* Calculate max_out_len without hard code for OpenAI model by zhouzaida in https://github.com/InternLM/opencompass/pull/158
* [API] Refine OpenAI by kennymckormick in https://github.com/InternLM/opencompass/pull/175
* [Fix] Use a copy of the config object in Task by gaotongxiao in https://github.com/InternLM/opencompass/pull/174
* Bump requests from 2.28.1 to 2.31.0 by dependabot in https://github.com/InternLM/opencompass/pull/178
* [Fix] Fix AGIEval multiple choice by Leymore in https://github.com/InternLM/opencompass/pull/137
* [Feature]: Refactor input and output by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/176
* [Feature] Add Xiezhi SQuAD2.0 ANLI by Leymore in https://github.com/InternLM/opencompass/pull/101
* [Feature] Support turbomind by tonysy in https://github.com/InternLM/opencompass/pull/166
* [Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor by gaotongxiao in https://github.com/InternLM/opencompass/pull/129
* [Fix] Fix some sc errors by liushz in https://github.com/InternLM/opencompass/pull/177
* Fix meta template & unit tests by gaotongxiao in https://github.com/InternLM/opencompass/pull/170
* [Feature] Support CUDA_VISIBLE_DEVICES and multiple tasks on one GPU by mzr1996 in https://github.com/InternLM/opencompass/pull/148
* [Docs] Enhance issue template by gaotongxiao in https://github.com/InternLM/opencompass/pull/183
* Skip invalid keys to avoid requesting API by zhouzaida in https://github.com/InternLM/opencompass/pull/184
* [Feature] update news by tonysy in https://github.com/InternLM/opencompass/pull/186
* [Feature] Support filtering specified levels message by zhouzaida in https://github.com/InternLM/opencompass/pull/187
* [Feat] add safety to collections by yingfhu in https://github.com/InternLM/opencompass/pull/185
* [Docs] Update contribution guide & toc, improve user experience by gaotongxiao in https://github.com/InternLM/opencompass/pull/188
* [Feature] add llama-oriented dataset configs by Leymore in https://github.com/InternLM/opencompass/pull/82
* [Feat] update postprocessor to get first option more accurately by yingfhu in https://github.com/InternLM/opencompass/pull/193
* [Feature] Add LEval datasets by gaotongxiao in https://github.com/InternLM/opencompass/pull/192
* Bump version to 0.1.2 by gaotongxiao in https://github.com/InternLM/opencompass/pull/190
* [Fix] fix bug for postprocessor by yingfhu in https://github.com/InternLM/opencompass/pull/195
* [Doc] update readme by Leymore in https://github.com/InternLM/opencompass/pull/196
Full Changelog: https://github.com/InternLM/opencompass/compare/0.1.1...0.1.2