We're thrilled to announce OpenCompass v0.2.1, loaded with new datasets, features, and vital fixes. This release is a testament to our ongoing commitment to enhancing user experience and broadening research capabilities.
🌟 **Highlights**:
- **Add Agent and Code datasets**: Diverse new datasets like `GPQA`, `mastermath2024v1`, and more, significantly expanding the scope of OpenCompass.
- **Support Different JudgeLLM Subjective Evaluation**: Providing more choice when choose judgellms.
- **Support Needle in Haystack**: Support Needle in Haystack for longtext evaluation.
- **Add VLLM Evaluation**: We support VLLM inference and evaluation.
Here's what's new:
🚀 New Features:
- 📦 **Dataset Expansion**:
- Added `rwkv-5-3b` model ([666](https://github.com/open-compass/opencompass/pull/666))
- Integration of diverse datasets including `GPQA`, `Creationbench`, and more.
- Support for new datasets like `mastermath2024v1`, `mbpp_plus`, and `sanitized_mbpp` ([744](https://github.com/open-compass/opencompass/pull/744), [#770](https://github.com/open-compass/opencompass/pull/770), [#745](https://github.com/open-compass/opencompass/pull/745))
- 🛠 **Functional Enhancements**:
- Subjective evaluation improvements ([692](https://github.com/open-compass/opencompass/pull/692), [#724](https://github.com/open-compass/opencompass/pull/724))
- Updated python action, slurm, and docker docs ([694](https://github.com/open-compass/opencompass/pull/694), [#718](https://github.com/open-compass/opencompass/pull/718))
- Turbomind API support and Qwen API integration ([693](https://github.com/open-compass/opencompass/pull/693), [#735](https://github.com/open-compass/opencompass/pull/735))
- 📖 **Documentation Updates**:
- Updated contamination, alignmentbench, and other docs for better clarity ([698](https://github.com/open-compass/opencompass/pull/698), [#707](https://github.com/open-compass/opencompass/pull/707))
- Fixed dead links and typos in various documents ([455](https://github.com/open-compass/opencompass/pull/455), [#773](https://github.com/open-compass/opencompass/pull/773), [#774](https://github.com/open-compass/opencompass/pull/774))
🐛 Bug Fixes:
- Addressed various issues including those in alignmentbench, configs, and postprocess scripts.
- Fixed bugs concerning subjective evaluation and EOS string detection.
- Quick fixes for improved performance and reliability.
🎉 Welcome New Contributors:
- A warm welcome to our first-time contributors:
- BBuf, DseidLi, Skyfall-xzz, RunningLeon, zehuichen123, AllentDan, Connor-Shen, Francis-llgg, hzhwcmhf, ChrisLiu6, yanyc428, tpoisonooo, jiangjin1999
🔗 Full Changelog
* add rwkv-5-3b model by BBuf in https://github.com/open-compass/opencompass/pull/666
* [Feature] Add double order of subjective evaluation and removing duplicated response among two models by bittersweet1999 in https://github.com/open-compass/opencompass/pull/692
* [Feat] update python action and slurm by yingfhu in https://github.com/open-compass/opencompass/pull/694
* [Doc] Update contamination docs by Leymore in https://github.com/open-compass/opencompass/pull/698
* alignmentbench infer and judge by bittersweet1999 in https://github.com/open-compass/opencompass/pull/697
* [Fix] Update alignmentbench by tonysy in https://github.com/open-compass/opencompass/pull/704
* removed redundant code in GSM8KDataset.load method. by DseidLi in https://github.com/open-compass/opencompass/pull/700
* [Fix] fix a bug on configs/eval_mixtral_8x7b.py by jingmingzhuo in https://github.com/open-compass/opencompass/pull/706
* [Doc] Update Doc for Alignbench by tonysy in https://github.com/open-compass/opencompass/pull/707
* [Fix] minor fix openai by yingfhu in https://github.com/open-compass/opencompass/pull/711
* Add Judgellms by bittersweet1999 in https://github.com/open-compass/opencompass/pull/710
* [Feat] Update math/agent by yingfhu in https://github.com/open-compass/opencompass/pull/716
* [Docs] update docker docs by yingfhu in https://github.com/open-compass/opencompass/pull/718
* [Fix] Quick fix for max_out_len in subjective evaluation by bittersweet1999 in https://github.com/open-compass/opencompass/pull/719
* [Feature] Support the use of humaneval_plus. by jingmingzhuo in https://github.com/open-compass/opencompass/pull/720
* [Feature] Add reasonbench dataset by Skyfall-xzz in https://github.com/open-compass/opencompass/pull/577
* [Feature] Add abbr for judgemodel in subjective evaluation by bittersweet1999 in https://github.com/open-compass/opencompass/pull/724
* Update configs for evaluating chat models like qwen, baichuan, llama2 using turbomind backend by RunningLeon in https://github.com/open-compass/opencompass/pull/721
* [News] add news for T-Eval by zehuichen123 in https://github.com/open-compass/opencompass/pull/727
* Add NeedleInAHaystack Test Support by DseidLi in https://github.com/open-compass/opencompass/pull/714
* [Fix] Fixed abbr erro of subjective alignbench and size partition by bittersweet1999 in https://github.com/open-compass/opencompass/pull/730
* add turbomind restful api support by AllentDan in https://github.com/open-compass/opencompass/pull/693
* [Fix] Update merge script for non-split settting by tonysy in https://github.com/open-compass/opencompass/pull/733
* [Sync] Sync with internal codes by Leymore in https://github.com/open-compass/opencompass/pull/734
* [Feature] Add InfiniteBench by philipwangOvO in https://github.com/open-compass/opencompass/pull/739
* Update LightllmApi and Fix mmlu bug by helloyongyang in https://github.com/open-compass/opencompass/pull/738
* [Feature] Add other judgelm prompts for Alignbench by bittersweet1999 in https://github.com/open-compass/opencompass/pull/731
* [Feat] support sanitized mbpp dataset by yingfhu in https://github.com/open-compass/opencompass/pull/745
* [Fix] SubSizePartition fix by bittersweet1999 in https://github.com/open-compass/opencompass/pull/746
* add chinese version of humaneval, mbpp by Connor-Shen in https://github.com/open-compass/opencompass/pull/743
* [Fix] fix erro in configs by bittersweet1999 in https://github.com/open-compass/opencompass/pull/750
* [Feature] Add Creationbench Dataset by bittersweet1999 in https://github.com/open-compass/opencompass/pull/753
* [Feat] update code config by yingfhu in https://github.com/open-compass/opencompass/pull/749
* update plot function in tools_needleinahaystack.py by DseidLi in https://github.com/open-compass/opencompass/pull/747
* [Feature] Add new dataset mastermath2024v1 by Francis-llgg in https://github.com/open-compass/opencompass/pull/744
* [Feature] Add GPQA Dataset by Francis-llgg in https://github.com/open-compass/opencompass/pull/729
* change NeedleInAHaystackDataset to dynamic loading by DseidLi in https://github.com/open-compass/opencompass/pull/754
* [Feature] Add support of Qwen API by hzhwcmhf in https://github.com/open-compass/opencompass/pull/735
* [Feature] Support LLaMA2-Accessory by ChrisLiu6 in https://github.com/open-compass/opencompass/pull/732
* [Fix] Fix small bug in alignbench by bittersweet1999 in https://github.com/open-compass/opencompass/pull/764
* [Feature] Add multi_round dataset evaluation by bittersweet1999 in https://github.com/open-compass/opencompass/pull/766
* [Feature] add subject ir dataset by bittersweet1999 in https://github.com/open-compass/opencompass/pull/755
* [Update] Update introduction of CompassBench-2024-Q1 by tonysy in https://github.com/open-compass/opencompass/pull/769
* [Fix] quick fix for postprocess by bittersweet1999 in https://github.com/open-compass/opencompass/pull/771
* Support Mbpp_plus dataset by Connor-Shen in https://github.com/open-compass/opencompass/pull/770
* [Fix] fix typos in drop prompt by yanyc428 in https://github.com/open-compass/opencompass/pull/773
* typo(installation.md): fix unzip commands by tpoisonooo in https://github.com/open-compass/opencompass/pull/774
* Contamination analysis for MMLU, Hellaswag, and ARC_c by liyucheng09 in https://github.com/open-compass/opencompass/pull/699
* [Docs] Update contamination docs by Leymore in https://github.com/open-compass/opencompass/pull/775
* [Feature] *_batch_generate* function, add the MultiTokenEOSCriteria by jiangjin1999 in https://github.com/open-compass/opencompass/pull/772
* [Sync] Sync with internal codes 2023.01.08 by Leymore in https://github.com/open-compass/opencompass/pull/777
**For a full list of updates, visit our** [Full Changelog](https://github.com/open-compass/opencompass/compare/0.2.0...0.2.1).
Thank you to every contributor, old and new. Your dedication is shaping OpenCompass into a more robust and versatile tool. 🙌 🎉
---
Remember to star 🌟 our GitHub repository if OpenCompass aids your research and development! Your support and feedback are crucial for our continuous improvement.