Internlm

Latest version: v0.2.0

Safety actively analyzes 714973 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 6

0.1.6

Welcome to the newest version of OpenCompass! v0.1.6 brings forth exciting dataset additions, crucial fixes, and enhanced documentation. We're confident that this release will provide a better and smoother experience for all users.

🆕 **Highlights**:
- **Dataset Enrichment**: Multiple additions, especially from the GLUE suite, to provide more versatility and better testing capabilities.
- **Documentation Revamp**: Fixed dead links and updated the 'get_started' section to assist our users in navigating OpenCompass effortlessly.
- **Introducing New Faces**: A warm welcome to our newest contributors. Your dedication and contributions are pivotal to our progress!

Dive into the details:

🌟 New Features:
- 📦 **Datasets Galore**:
- Introduced WikiText-2&103 dataset ([397](https://github.com/open-compass/opencompass/pull/397))
- GLUE dataset additions:
- CoLA ([406](https://github.com/open-compass/opencompass/pull/406))
- QQP ([438](https://github.com/open-compass/opencompass/pull/438))
- MRPC ([440](https://github.com/open-compass/opencompass/pull/440))
- Lawbench dataset addition ([460](https://github.com/open-compass/opencompass/pull/460))

- 🛠 **Utilities and Enhancements**:
- Re-implementation of `ceval` load dataset ([446](https://github.com/open-compass/opencompass/pull/446))
- Integrated turbomind inference through its RPC API ([414](https://github.com/open-compass/opencompass/pull/414))
- Moved `fix_id_list` to Retriever for better code organization ([442](https://github.com/open-compass/opencompass/pull/442))

- 📖 **Documentation and Syncs**:
- Updated dataset list and `get_started` section ([437](https://github.com/open-compass/opencompass/pull/437), [#435](https://github.com/open-compass/opencompass/pull/435))
- Resolved dead links in the readme ([455](https://github.com/open-compass/opencompass/pull/455))
- Enhancements to LongEval and subjective evaluation ([443](https://github.com/open-compass/opencompass/pull/443), [#475](https://github.com/open-compass/opencompass/pull/475))

🐛 Bug Fixes:
- Addressed issues related to `clp` errors and support for bs>1 ([439](https://github.com/open-compass/opencompass/pull/439))
- Resolved issues concerning `jieba` rouge ([459](https://github.com/open-compass/opencompass/pull/459), [#467](https://github.com/open-compass/opencompass/pull/467))
- Enhanced EOS string detection for splitting ([477](https://github.com/open-compass/opencompass/pull/477))
- Various other fixes for optimal performance.

🎉 Welcome New Contributors:
- A big shout-out to our new contributors:
- KevinNuNu ([First PR](https://github.com/open-compass/opencompass/pull/397))
- lvhan028 ([First PR](https://github.com/open-compass/opencompass/pull/414))

Huge thanks to all contributors! Your constant efforts make OpenCompass better with each release. 🙌 🎉

Changelog

* [SIG] add WikiText-2&103 by KevinNuNu in https://github.com/open-compass/opencompass/pull/397
* [SIG] add GLUE_CoLA dataset by KevinNuNu in https://github.com/open-compass/opencompass/pull/406
* [SIG] add GLUE QQP dataset by KevinNuNu in https://github.com/open-compass/opencompass/pull/438
* [SIG] add GLUE_MRPC dataset by KevinNuNu in https://github.com/open-compass/opencompass/pull/440
* [Doc] Update dataset list by Leymore in https://github.com/open-compass/opencompass/pull/437
* [Fix] use eval field check by Leymore in https://github.com/open-compass/opencompass/pull/441
* [Sync] Update LongEval by philipwangOvO in https://github.com/open-compass/opencompass/pull/443
* [Fix] fix clp potential error and support bs>1 by yingfhu in https://github.com/open-compass/opencompass/pull/439
* [Feature] re-implement ceval load dataset by Leymore in https://github.com/open-compass/opencompass/pull/446
* Integrate turbomind inference via its RPC API instead of its python API by lvhan028 in https://github.com/open-compass/opencompass/pull/414
* [Docs] update get_started by gaotongxiao in https://github.com/open-compass/opencompass/pull/435
* [Refactor] Move fix_id_list to Retriever by gaotongxiao in https://github.com/open-compass/opencompass/pull/442
* [Docs] Fix dead links in readme by gaotongxiao in https://github.com/open-compass/opencompass/pull/455
* [Fix] Use jieba rouge in lcsts by Leymore in https://github.com/open-compass/opencompass/pull/459
* [Fix] Fix jieba rouge with empty string by Leymore in https://github.com/open-compass/opencompass/pull/467
* [Sync] Add subjective evaluation by Leymore in https://github.com/open-compass/opencompass/pull/475
* [Feature] Add lawbench by Leymore in https://github.com/open-compass/opencompass/pull/460
* [Fix] split if and only if complete eos string shows up by Leymore in https://github.com/open-compass/opencompass/pull/477
* Bump version to 0.1.6 by Leymore in https://github.com/open-compass/opencompass/pull/478

**For a detailed overview, check out our** [Full Changelog](https://github.com/open-compass/opencompass/compare/0.1.5...0.1.6).

---

If you find OpenCompass beneficial, kindly star 🌟 our GitHub repository! We value your feedback, reviews, and continued support.

0.1.5

Dive into our newly improved features, bug fixes, and most notably our enhanced dataset support, coming together to refine your experience.

🆕 **Highlights**:
- **Boosted Dataset Integrations**: This release paves the way for support on numerous datasets like `ds1000`, `promptbench`, `antropics evals`, `kaoshi`, and many more, making OpenCompass more versatile than ever.
- **More Evaluation Types**: We starts integrating subjective and agent-adied LLM evaluation into OpenCompass. Stay tuned!

Explore the detailed changes:

🌟 New Features:
- 📦 **New Datasets and Features**:
- `ds1000` dataset support ([395](https://github.com/open-compass/opencompass/pull/395))
- `promptbench` dataset implementation ([239](https://github.com/open-compass/opencompass/pull/239))
- `antropics evals` dataset support ([422](https://github.com/open-compass/opencompass/pull/422))
- `kaoshi` dataset introduction ([392](https://github.com/open-compass/opencompass/pull/392))
- Initial support for subjective evaluation ([421](https://github.com/open-compass/opencompass/pull/421))
- Support for GSM8k evaluation tools ([277](https://github.com/open-compass/opencompass/pull/277))
- scibench evaluation added ([393](https://github.com/open-compass/opencompass/pull/393))

📖 **Documentation**:
- News updates and introduction figure in README ([375](https://github.com/open-compass/opencompass/pull/375), [#413](https://github.com/open-compass/opencompass/pull/413))
- Updated `get_started.md` and fixed naming issues ([377](https://github.com/open-compass/opencompass/pull/377), [#380](https://github.com/open-compass/opencompass/pull/380))
- New FAQ section added ([384](https://github.com/open-compass/opencompass/pull/384))
- README addition in `longeval` ([389](https://github.com/open-compass/opencompass/pull/389))
- Multimodal documentation introduced ([334](https://github.com/open-compass/opencompass/pull/334))

🛠️ Bug Fixes:
- Addressed a potential OOM issue ([387](https://github.com/open-compass/opencompass/pull/387))
- Added `has_image` fix to scienceqa ([391](https://github.com/open-compass/opencompass/pull/391))
- Resolved performance issues of `visualglm` ([424](https://github.com/open-compass/opencompass/pull/424))
- Debug logger fix for summarizer ([417](https://github.com/open-compass/opencompass/pull/417))
- Addressed errors in keep keys ([431](https://github.com/open-compass/opencompass/pull/431))

⚙ Enhancements and Refactors:
- Refinement in docs and codes for better user guidance ([409](https://github.com/open-compass/opencompass/pull/409))
- Custom summarizer argument added in CLI mode ([411](https://github.com/open-compass/opencompass/pull/411))
- `mlugowl` llamaadapter introduced ([405](https://github.com/open-compass/opencompass/pull/405))
- Enhanced mm models support on public datasets ([412](https://github.com/open-compass/opencompass/pull/412))
- Customized config path support ([423](https://github.com/open-compass/opencompass/pull/423))

🎉 New Contributors:

A heartfelt welcome to our first-time contributors:

wangxidong06 ([First PR](https://github.com/open-compass/opencompass/pull/376))
so2liu ([First PR](https://github.com/open-compass/opencompass/pull/411))
HoBeedzc ([First PR](https://github.com/open-compass/opencompass/pull/417))
CuteyThyme ([First PR](https://github.com/open-compass/opencompass/pull/393))
chenbohua3 ([First PR](https://github.com/open-compass/opencompass/pull/423))

To all contributors, old and new, thank you for continually enhancing OpenCompass! Your efforts are deeply valued. 🙌 🎉

If you love OpenCompass, don't forget to star 🌟 our GitHub repository! Your feedback, reviews, and contributions immensely help in shaping the product.

Changelog
* [Doc] Update News by tonysy in https://github.com/open-compass/opencompass/pull/375
* Update get_started.md by liushz in https://github.com/open-compass/opencompass/pull/377
* [CI] Publish to Pypi by gaotongxiao in https://github.com/open-compass/opencompass/pull/366
* [Docs] Fix incorrect name in get_started by gaotongxiao in https://github.com/open-compass/opencompass/pull/380
* fix potential OOM issue by cdpath in https://github.com/open-compass/opencompass/pull/387
* [Docs] Add FAQ by gaotongxiao in https://github.com/open-compass/opencompass/pull/384
* Add CMB by wangxidong06 in https://github.com/open-compass/opencompass/pull/376
* [Fix]: Add has_image to scienceqa by YuanLiuuuuuu in https://github.com/open-compass/opencompass/pull/391
* [Feat] support ds1000 dataset by yingfhu in https://github.com/open-compass/opencompass/pull/395
* [Feat] implementation for support promptbench by yingfhu in https://github.com/open-compass/opencompass/pull/239
* [Feat] refine docs and codes for more user guides by yingfhu in https://github.com/open-compass/opencompass/pull/409
* [Docs] Readme in longeval by philipwangOvO in https://github.com/open-compass/opencompass/pull/389
* feat: add custom summarizer argument in CLI run mode 在CLI启动模式中添加自定义Summarizer参数 by so2liu in https://github.com/open-compass/opencompass/pull/411
* Yhzhang/add mlugowl llamaadapter by ZhangYuanhan-AI in https://github.com/open-compass/opencompass/pull/405
* [Feat] Support mm models on public dataset and fix several issues. by yyk-wew in https://github.com/open-compass/opencompass/pull/412
* [Docs] Add intro figure to README by gaotongxiao in https://github.com/open-compass/opencompass/pull/413
* [fix] summarizer debug logger by HoBeedzc in https://github.com/open-compass/opencompass/pull/417
* [Doc] Update news by Leymore in https://github.com/open-compass/opencompass/pull/420
* [Feature] Use local accuracy from hf implements by Leymore in https://github.com/open-compass/opencompass/pull/416
* [Feat] support antropics evals dataset by yingfhu in https://github.com/open-compass/opencompass/pull/422
* [Fix] Fix performance issue of visualglm. by yyk-wew in https://github.com/open-compass/opencompass/pull/424
* [Feature] Log gold answer in prediction output by gaotongxiao in https://github.com/open-compass/opencompass/pull/419
* Support GSM8k evaluation with tools by Lagent and LangChain by mzr1996 in https://github.com/open-compass/opencompass/pull/277
* [Sync] Initial support of subjective evaluation by gaotongxiao in https://github.com/open-compass/opencompass/pull/421
* [Fix] P0: errors in keep keys by gaotongxiao in https://github.com/open-compass/opencompass/pull/431
* add evaluation of scibench by CuteyThyme in https://github.com/open-compass/opencompass/pull/393
* [Feature] Add kaoshi dataset by liushz in https://github.com/open-compass/opencompass/pull/392
* [Docs] Add multimodal docs by fangyixiao18 in https://github.com/open-compass/opencompass/pull/334
* support customize config path by chenbohua3 in https://github.com/open-compass/opencompass/pull/423

**Full Changelog**: https://github.com/open-compass/opencompass/compare/0.1.4...0.1.5

0.1.4

OpenCompass v0.1.4 is here with an array of features, documentation improvements, and key fixes! Dive in to see what's in store:

🆕 **Highlights**:

**More Tools and Features**: OpenCompass continues to expand its repertoire with the addition of tools like update suffix, codellama, preds collection tools, qwen & qwen-chat support, and more. Not forgetting our attention to Otter and the MMBench Evaluation!
**Documentation Facelift**: We've made several updates to our documentation, ensuring it stays relevant, user-friendly, and aesthetically pleasing.
**Essential Bug Fixes**: We’ve tackled numerous bugs, especially those concerning tokens, triviaqa, nq postprocess, and qwen config.
**Enhancements**: From simplifying execution logic to suppressing warnings, we’re always on the lookout for ways to improve our product.

Dive deeper to learn more:

🌟 New Features:

📦 **Tools and Integrations**:
- Application of update suffix tool ([280](https://github.com/open-compass/opencompass/pull/280)).
- Support for codellama and preds collection tools ([335](https://github.com/open-compass/opencompass/pull/335)).
- Addition of qwen & qwen-chat support ([286](https://github.com/open-compass/opencompass/pull/286)).
- Introduction of Otter to OpenCompass MMBench Evaluation ([232](https://github.com/open-compass/opencompass/pull/232)).
- Support for LLaVA and mPLUG-Owl ([331](https://github.com/open-compass/opencompass/pull/331)).

🛠 **Utilities and Functionality**:
- Enhanced sample count in prompt_viewer ([273](https://github.com/open-compass/opencompass/pull/273)).
- Ignored ZeroRetriever error when id_list provided ([340](https://github.com/open-compass/opencompass/pull/340)).
- Improved default task size ([360](https://github.com/open-compass/opencompass/pull/360)).

📝 Documentation:

- Updated communication channels: WeChat and Discord ([328](https://github.com/open-compass/opencompass/pull/328)).
- Documentation theme revamped for a fresh look ([332](https://github.com/open-compass/opencompass/pull/332)).
- Detailed documentation for the new entry script ([246](https://github.com/open-compass/opencompass/pull/246)).
- MMBench documentation updated ([336](https://github.com/open-compass/opencompass/pull/336)).

🛠️ Bug Fixes:

- Resolved issue when missing both pad and eos token ([287](https://github.com/open-compass/opencompass/pull/287)).
- Addressed triviaqa & nq postprocess glitches ([350](https://github.com/open-compass/opencompass/pull/350)).
- Fixed qwen configuration inaccuracies ([358](https://github.com/open-compass/opencompass/pull/358)).
- Default value added for zero retriever ([361](https://github.com/open-compass/opencompass/pull/361)).

⚙ Enhancements and Refactors:

- Streamlined execution logic in run.py and ensured temp files cleanup ([337](https://github.com/open-compass/opencompass/pull/337)).
- Suppressed unnecessary warnings raised by get_logger ([353](https://github.com/open-compass/opencompass/pull/353)).
- Import checks of multimodal added ([352](https://github.com/open-compass/opencompass/pull/352)).

🎉 New Contributors:

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

Luodian ([First PR](https://github.com/open-compass/opencompass/pull/232))
ZhangYuanhan-AI ([First PR](https://github.com/open-compass/opencompass/pull/331))
HAOCHENYE ([First PR](https://github.com/open-compass/opencompass/pull/353))

Thank you to the entire community for pushing OpenCompass forward. Make sure to star 🌟 our GitHub repository if OpenCompass aids your endeavors! We treasure your feedback and contributions.

---

Changelog

* [Feature] Add and apply update suffix tool by Leymore in https://github.com/open-compass/opencompass/pull/280
* support sample count in prompt_viewer by cdpath in https://github.com/open-compass/opencompass/pull/273
* docs: update wechat and discord by vansin in https://github.com/open-compass/opencompass/pull/328
* [Docs] Update doc theme by gaotongxiao in https://github.com/open-compass/opencompass/pull/332
* [Feat] support codellama and preds collection tools by yingfhu in https://github.com/open-compass/opencompass/pull/335
* [Feature] Add qwen & qwen-chat support by Leymore in https://github.com/open-compass/opencompass/pull/286
* [Feat] Add Otter to OpenCompass MMBench Evaluation by Luodian in https://github.com/open-compass/opencompass/pull/232
* [Docs] Update docs for new entry script by gaotongxiao in https://github.com/open-compass/opencompass/pull/246
* [Fix] Fix when missing both pad and eos token by Leymore in https://github.com/open-compass/opencompass/pull/287
* [Doc] Update MMBench.md by kennymckormick in https://github.com/open-compass/opencompass/pull/336
* [Feat] Support LLaVA and mPLUG-Owl by ZhangYuanhan-AI in https://github.com/open-compass/opencompass/pull/331
* [Feature] Ignore ZeroRetriever error when id_list provided by Leymore in https://github.com/open-compass/opencompass/pull/340
* [Enhance] Add import check of multimodal by fangyixiao18 in https://github.com/open-compass/opencompass/pull/352
* [Sync] [Enhancement] Simplify execution logic in run.py; use finally to clean up temp files by gaotongxiao in https://github.com/open-compass/opencompass/pull/337
* [Fix] Fix triviaqa & nq postprocess by Leymore in https://github.com/open-compass/opencompass/pull/350
* [Enhance] Supress warning raised by get_logger by HAOCHENYE in https://github.com/open-compass/opencompass/pull/353
* [Fix] Update qwen config by Leymore in https://github.com/open-compass/opencompass/pull/358
* [Fix] zero retriever add default value by Leymore in https://github.com/open-compass/opencompass/pull/361
* [Enhancement] Increase default task size by gaotongxiao in https://github.com/open-compass/opencompass/pull/360
* [Fix] Quick lint fix by Leymore in https://github.com/open-compass/opencompass/pull/362
* [Docs] update code evaluator docs by yingfhu in https://github.com/open-compass/opencompass/pull/354
* [Feat] support wizardcoder series by yingfhu in https://github.com/open-compass/opencompass/pull/344
* [Feat] Support Qwen-VL-Chat on MMBench. by yyk-wew in https://github.com/open-compass/opencompass/pull/312
* [Feature] Update claude2 postprocessor by gaotongxiao in https://github.com/open-compass/opencompass/pull/365
* [Doc] Update Overview by tonysy in https://github.com/open-compass/opencompass/pull/242
* [Feat] Update URL by tonysy in https://github.com/open-compass/opencompass/pull/368
* [Feature] Update llama2 implement by Leymore in https://github.com/open-compass/opencompass/pull/372
* [Feature] Add open source dataset eval config of instruct-blip by fangyixiao18 in https://github.com/open-compass/opencompass/pull/370
* [Fix] Update bbh implement & Fix bbh suffix by Leymore in https://github.com/open-compass/opencompass/pull/371
* [Feaure] Add new models: baichuan2, tigerbot, vicuna v1.5 by Leymore in https://github.com/open-compass/opencompass/pull/373
* Bump version to 0.1.4 by gaotongxiao in https://github.com/open-compass/opencompass/pull/367

For an exhaustive list of changes, kindly check our [Full Changelog](https://github.com/open-compass/opencompass/compare/0.1.3...0.1.4).

0.1.3

OpenCompass keeps getting better! v0.1.3 brings a variety of enhancements, new features, and crucial fixes. Here’s a summary of what we've packed into this release:

🆕 **Highlights**:

**Extended Dataset Support**: OpenCompass now integrates a broader range of public datasets, including but not limited to `adv_glue`, `codegeex2`, `Humanevalx`, `SEED-Bench`, `LongBench`, and `LEval`. We aim to provide extensive coverage to cater to a variety of research needs.
**Utility Additions**: From the inclusion of multi-modal evaluations on MME benchmark to the Tree-of-Thought method, this release comes packed with functionality enhancements.
**Bug Extermination**: Your feedback helps us grow. We’ve squashed a series of bugs to improve your experience.
**More Evaluation Benchmark for Multimodal Models**. We support another 10 evaluation benchmarks for multimodal models, including COCO Caption and ScienceQA, and provide corresponding evaluation code.

Let's delve deeper into what's new:

🌟 New Features:

📦 **Extended Dataset Support**:
- Introduction of other public datasets ([206](https://github.com/InternLM/opencompass/pull/206), [#214](https://github.com/InternLM/opencompass/pull/214)).
- Support for `adv_glue` dataset focused on adversarial robustness ([205](https://github.com/InternLM/opencompass/pull/205)).
- Added `codegeex2`, `Humanevalx` ([210](https://github.com/InternLM/opencompass/pull/210)).
- Integration of SEED-Bench ([203](https://github.com/InternLM/opencompass/pull/203)).
- LongBench support ([236](https://github.com/InternLM/opencompass/pull/236)).
- Reconstruct `LEval` dataset ([266](https://github.com/InternLM/opencompass/pull/266)).
- Support another 10 public evaluation benchmarks for multimodal models ([214](https://github.com/InternLM/opencompass/pull/214/))

🛠 **Utilities and Functionality**:
- Launch script added for ease of operations ([222](https://github.com/InternLM/opencompass/pull/222)).
- Multi-modal evaluation on MME benchmark ([197](https://github.com/InternLM/opencompass/pull/197)).
- Support for visualglm and llava on MMBench evaluation ([211](https://github.com/InternLM/opencompass/pull/211)).
- Tree-of-Thought method introduced ([173](https://github.com/InternLM/opencompass/pull/173)).
- Introduction of `llama2` native implementations ([235](https://github.com/InternLM/opencompass/pull/235)).
- Flamingo and Claude support added ([258](https://github.com/InternLM/opencompass/pull/258), [#253](https://github.com/InternLM/opencompass/pull/253)).

📝 Documentation:

- Navigation bar language type updated for better clarity ([212](https://github.com/InternLM/opencompass/pull/212)).
- News updates for keeping users informed ([241](https://github.com/InternLM/opencompass/pull/241), [#243](https://github.com/InternLM/opencompass/pull/243)).
- Summarizer documentation added ([231](https://github.com/InternLM/opencompass/pull/231)).

🛠️ Bug Fixes:

- Addressed an issue with multiple rounds of inference using mm_eval ([201](https://github.com/InternLM/opencompass/pull/201)).
- Miscellaneous fixes such as name adjustments, requirements, and bin_trim corrections ([223](https://github.com/InternLM/opencompass/pull/223), [#229](https://github.com/InternLM/opencompass/pull/229), [#237](https://github.com/InternLM/opencompass/pull/237)).
- Local runner debug issue fixed ([238](https://github.com/InternLM/opencompass/pull/238)).
- Resolved bugs for PeftModel generate ([252](https://github.com/InternLM/opencompass/pull/252)).

⚙ Enhancements and Refactors:

- Refactored instructblip for better performance and readability ([227](https://github.com/InternLM/opencompass/pull/227)).
- Improved crowspairs postprocess ([251](https://github.com/InternLM/opencompass/pull/251)).
- Optimization to use sympy only when necessary ([255](https://github.com/InternLM/opencompass/pull/255)).

🎉 New Contributors:

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

yyk-wew ([First PR](https://github.com/InternLM/opencompass/pull/201))
fangyixiao18 ([First PR](https://github.com/InternLM/opencompass/pull/203))
philipwangOvO ([First PR](https://github.com/InternLM/opencompass/pull/236))
cdpath ([First PR](https://github.com/InternLM/opencompass/pull/270))

Thank you to our dedicated contributors for making OpenCompass even more comprehensive and user-friendly! 🙌 🎉

Remember to star 🌟 our GitHub repository if you find OpenCompass helpful! Your feedback and contributions are invaluable.

---

Change log

* [Fix] Fix bugs of multiple rounds of inference when using mm_eval by yyk-wew in https://github.com/InternLM/opencompass/pull/201
* [Feature]: Add other public datasets by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/206
* [Doc] Update Navigation bar language type by Ezra-Yu in https://github.com/InternLM/opencompass/pull/212
* [Feat] support adv_glue dataset for adversarial robustness by yingfhu in https://github.com/InternLM/opencompass/pull/205
* [Feat] Add codegeex2 and Humanevalx by Ezra-Yu in https://github.com/InternLM/opencompass/pull/210
* [Feature]: Add other public datasets config by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/214
* [Feature] Support SEED-Bench by fangyixiao18 in https://github.com/InternLM/opencompass/pull/203
* [Feature]: Add launch script by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/222
* [Fix]: Fix name by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/223
* [Fix] requirements by gaotongxiao in https://github.com/InternLM/opencompass/pull/229
* [Dataset] LongBench by philipwangOvO in https://github.com/InternLM/opencompass/pull/236
* [Fix] bin_trim by philipwangOvO in https://github.com/InternLM/opencompass/pull/237
* [Feat] Support multi-modal evaluation on MME benchmark. by yyk-wew in https://github.com/InternLM/opencompass/pull/197
* [Feat] Support visualglm and llava for MMBench evaluation. by yyk-wew in https://github.com/InternLM/opencompass/pull/211
* [Fix] fix local runner debug by Leymore in https://github.com/InternLM/opencompass/pull/238
* Update News by tonysy in https://github.com/InternLM/opencompass/pull/241
* [Doc]update news by tonysy in https://github.com/InternLM/opencompass/pull/243
* Update run.py by liushz in https://github.com/InternLM/opencompass/pull/247
* [Doc] Add summarizer doc by Leymore in https://github.com/InternLM/opencompass/pull/231
* [Feature] Add llama2 native implements by Leymore in https://github.com/InternLM/opencompass/pull/235
* [Feature] Add Tree-of-Thought method by liushz in https://github.com/InternLM/opencompass/pull/173
* [Refactor] Refactor instructblip by fangyixiao18 in https://github.com/InternLM/opencompass/pull/227
* [Enhancement] Update crowspairs postprocess by gaotongxiao in https://github.com/InternLM/opencompass/pull/251
* [Fix] use sympy only when necessary by gaotongxiao in https://github.com/InternLM/opencompass/pull/255
* Update .owners.yml by tonysy in https://github.com/InternLM/opencompass/pull/261
* [Fix] Fix bugs for PeftModel generate by LZHgrla in https://github.com/InternLM/opencompass/pull/252
* [Feature]: Add Flamingo by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/258
* [Feature] Add Claude support by gaotongxiao in https://github.com/InternLM/opencompass/pull/253
* [Dataset] Reconstruct LEval by philipwangOvO in https://github.com/InternLM/opencompass/pull/266
* [Feature]: Verify the acc of these public datasets by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/269
* * [Feat] Support public dataset of visualglm and llava. by yyk-wew in https://github.com/InternLM/opencompass/pull/265
* [Fix] wrong path in dataset collections by gaotongxiao in https://github.com/InternLM/opencompass/pull/272
* [Fix] update descriptions of tools by cdpath in https://github.com/InternLM/opencompass/pull/270
* [Feature] Support model-bound prediction postprocessor, use it in Claude by gaotongxiao in https://github.com/InternLM/opencompass/pull/268
* [Feature] Simplify entry script by gaotongxiao in https://github.com/InternLM/opencompass/pull/204
* Update README.md by tonysy in https://github.com/InternLM/opencompass/pull/262

**For a complete list of changes, please refer to our** [Full Changelog](https://github.com/InternLM/opencompass/compare/0.1.2...0.1.3).

0.1.2

This release continues the evolution of OpenCompass, bringing a mix of new features, optimizations, documentation improvements, and bug fixes.

🆕Highlights

**🏆 Leaderboard**: The evaluation results of [Qwen-7B](https://github.com/QwenLM/Qwen-7B), [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B), [LLaMA-2](https://github.com/facebookresearch/llama), and GPT-4 has been posted to our [leaderboard](https://opencompass.org.cn/leaderboard-llm). Now it's also possible to conduct [model comparison](https://opencompass.org.cn/model-compare/GPT-4,ChatGPT,LLaMA-2-70B,LLaMA-65B) online. We hope this feature offers deeper insights!

**📊 Datasets**: Introduction of Xiezhi, SQuAD2.0, ANLI, LEval datasets, and more for diverse applications. ([101](https://github.com/InternLM/opencompass/pull/101), [#192](https://github.com/InternLM/opencompass/pull/192)) Add datasets related to safety to collections. [#185]

**🎭New modality**: Support for [MMBench](https://opencompass.org.cn/mmbench) is introduced, and the evaluation of multi-modal models is on the way! (#56 ,161) Besides, Intern language model is introduced. ([51](https://github.com/InternLM/opencompass/pull/51))

**⚙️Enhancement**: Several enhancements on OpenAI models, including key deprecation, temperature setting, etc. [121] [128] Supporting multiple tasks on one GPU, filtering messages by levels, and more. [148] [187]

**📝 Documentation**: Comprehensive updates and fixes across READMEs, issue templates, prompt docs, metric documentation, and more.

**🛠️ Bug Fixes**: Including seed fixes in HFEvaluator, addressing issues in AGIEval multiple choice questions, and more. [122] [137]

🎉 New Contributors

Thank you to all our contributors for this release, with a special shoutout to our new contributors:

go-with-me000 ([First Contribution](https://github.com/InternLM/opencompass/pull/51))
anakin-skywalker-Joseph ([First Contribution](https://github.com/InternLM/opencompass/pull/125))
zhouzaida ([First Contribution](https://github.com/InternLM/opencompass/pull/152))
dependabot ([First Contribution](https://github.com/InternLM/opencompass/pull/178))

Changelog

* [Feat] add auto assignee bot by yingfhu in https://github.com/InternLM/opencompass/pull/105
* [Doc] Update Readme and Fix failed links by Ezra-Yu in https://github.com/InternLM/opencompass/pull/108
* Doc: add twitter link by vansin in https://github.com/InternLM/opencompass/pull/111
* Support intern lanuage model by go-with-me000 in https://github.com/InternLM/opencompass/pull/51
* [Docs] Update issue templates for proper guidance to discussions by gaotongxiao in https://github.com/InternLM/opencompass/pull/116
* [Feature] Allow explicitly setting the temperature for API model by kennymckormick in https://github.com/InternLM/opencompass/pull/121
* [Fix] Fix seed in HFEvaluator by kennymckormick in https://github.com/InternLM/opencompass/pull/122
* [Feature] Update SC by Leymore in https://github.com/InternLM/opencompass/pull/126
* 说明文档标题修改 by anakin-skywalker-Joseph in https://github.com/InternLM/opencompass/pull/125
* [Docs] Update prompt docs by Leymore in https://github.com/InternLM/opencompass/pull/46
* [Enhancement] Update README.md by tonysy in https://github.com/InternLM/opencompass/pull/119
* [DOC] Add metric doc by Ezra-Yu in https://github.com/InternLM/opencompass/pull/118
* [Feature] Evaluating acc based on minimum edit distance, update SIQA by gaotongxiao in https://github.com/InternLM/opencompass/pull/130
* [Feature] Several enhancements by gaotongxiao in https://github.com/InternLM/opencompass/pull/142
* [Doc] update acknowledgements by Leymore in https://github.com/InternLM/opencompass/pull/147
* Fix typo in readme by zhouzaida in https://github.com/InternLM/opencompass/pull/152
* [Feature]: Use multimodal by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/73
* [Refine] Refine PR 122 by kennymckormick in https://github.com/InternLM/opencompass/pull/123
* [Enhancement] Optimize OpenAI models by gaotongxiao in https://github.com/InternLM/opencompass/pull/128
* Update pre-commit ignore-word list by gaotongxiao in https://github.com/InternLM/opencompass/pull/162
* [Script] Add scripts to evaluate MMBench by kennymckormick in https://github.com/InternLM/opencompass/pull/161
* [Doc] Update Readme by tonysy in https://github.com/InternLM/opencompass/pull/165
* [Feature]: Add mm suport for local runner by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/169
* Calculate max_out_len without hard code for OpenAI model by zhouzaida in https://github.com/InternLM/opencompass/pull/158
* [API] Refine OpenAI by kennymckormick in https://github.com/InternLM/opencompass/pull/175
* [Fix] Use a copy of the config object in Task by gaotongxiao in https://github.com/InternLM/opencompass/pull/174
* Bump requests from 2.28.1 to 2.31.0 by dependabot in https://github.com/InternLM/opencompass/pull/178
* [Fix] Fix AGIEval multiple choice by Leymore in https://github.com/InternLM/opencompass/pull/137
* [Feature]: Refactor input and output by YuanLiuuuuuu in https://github.com/InternLM/opencompass/pull/176
* [Feature] Add Xiezhi SQuAD2.0 ANLI by Leymore in https://github.com/InternLM/opencompass/pull/101
* [Feature] Support turbomind by tonysy in https://github.com/InternLM/opencompass/pull/166
* [Enhancement] Add humaneval postprocessor for GPT models & eval config for GPT4, enhance the original humaneval postprocessor by gaotongxiao in https://github.com/InternLM/opencompass/pull/129
* [Fix] Fix some sc errors by liushz in https://github.com/InternLM/opencompass/pull/177
* Fix meta template & unit tests by gaotongxiao in https://github.com/InternLM/opencompass/pull/170
* [Feature] Support CUDA_VISIBLE_DEVICES and multiple tasks on one GPU by mzr1996 in https://github.com/InternLM/opencompass/pull/148
* [Docs] Enhance issue template by gaotongxiao in https://github.com/InternLM/opencompass/pull/183
* Skip invalid keys to avoid requesting API by zhouzaida in https://github.com/InternLM/opencompass/pull/184
* [Feature] update news by tonysy in https://github.com/InternLM/opencompass/pull/186
* [Feature] Support filtering specified levels message by zhouzaida in https://github.com/InternLM/opencompass/pull/187
* [Feat] add safety to collections by yingfhu in https://github.com/InternLM/opencompass/pull/185
* [Docs] Update contribution guide & toc, improve user experience by gaotongxiao in https://github.com/InternLM/opencompass/pull/188
* [Feature] add llama-oriented dataset configs by Leymore in https://github.com/InternLM/opencompass/pull/82
* [Feat] update postprocessor to get first option more accurately by yingfhu in https://github.com/InternLM/opencompass/pull/193
* [Feature] Add LEval datasets by gaotongxiao in https://github.com/InternLM/opencompass/pull/192
* Bump version to 0.1.2 by gaotongxiao in https://github.com/InternLM/opencompass/pull/190
* [Fix] fix bug for postprocessor by yingfhu in https://github.com/InternLM/opencompass/pull/195
* [Doc] update readme by Leymore in https://github.com/InternLM/opencompass/pull/196

Full Changelog: https://github.com/InternLM/opencompass/compare/0.1.1...0.1.2

0.1.1

Add some more datasets.
* AGIEval
* anli
* cmmlu
* jigsawmultilingual
* realtoxicprompts
* SQuAD2.0
* TheoremQA
* triviaqa
* xiezhi
* Xsum

Page 3 of 6

Releases

Has known vulnerabilities

Previous Next

Internlm

Page 3 of 6

0.1.6

0.1.5

0.1.4

0.1.3

0.1.2

0.1.1

Page 3 of 6

Links

Releases