lmms-eval Changelog

0.3.0

* PyPI 0.3.0 by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/432

New Contributors
* ZhangYuanhan-AI made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/291
* HanSolo9682 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/326
* LooperXX made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/327
* JARVVVIS made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/343
* Nicous20 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/341
* Espere-1119-Song made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/342
* rese1f made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/359
* Xiuyu-Li made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/364
* Baiqi-Li made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/371
* spacecraft1013 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/381
* Qu3tzal made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/397
* mu-cai made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/402
* Li-Qingyun made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/404

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.4...v0.3.0

0.2.4

What's Changed
* [Fix] Fix bugs in returning result dict and bring back anls metric by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/221
* fix: fix wrong args in wandb logger by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/226
* [feat] Add check for existence of accelerator before waiting by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/227
* add more language tasks and fix fewshot evaluation bugs by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/228
* Remove unnecessary LM object removal in evaluator by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/229
* [fix] Shallow copy issue by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/231
* [Minor] Fix max_new_tokens in video llava by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/237
* Update LMMS evaluation tasks for various subjects by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/240
* [Fix] Fix async append result in different order issue by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/244
* Update the version requirement for `transformers` by zhijian-liu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/235
* Add new LMMS evaluation task for wild vision benchmark by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/247
* Add raw score to wildvision bench by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/250
* [Fix] Strict video to be single processing by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/246
* Refactor wild_vision_aggregation_raw_scores to calculate average score by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/252
* [Fix] Bring back process result pbar by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/251
* [Minor] Update utils.py by YangYangGirl in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/249
* Refactor distributed gathering of logged samples and metrics by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/253
* Refactor caching module and fix serialization issue by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/255
* [Minor] Bring back fix for metadata by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/258
* [Model] support minimonkey model by white2018 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/257
* [Feat] add regression test and change saving logic related to `output_path` by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/259
* [Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/260
* [Model] support cogvlm2 model by white2018 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/261
* [Docs] Update and sort current_tasks.md by pbcong in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/262
* fix error name with infovqa task by ZhaoyangLi-nju in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/265
* [Task] Add MMT and MMT_MI (Multiple Image) Task by ngquangtrung57 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/270
* mme-realworld by yfzhang114 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/266
* [Model] support Qwen2 VL by abzb1 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/268
* Support new task mmworld by jkooy in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/269
* Update current tasks.md by pbcong in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/272
* [feat] support video evaluation for qwen2-vl and add mix-evals-video2text by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/275
* [Feat][Task] Add multi-round evaluation in llava-onevision; Add MMSearch Benchmark by CaraJ7 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/277
* [Fix] Model name None in Task manager, mix eval model specific kwargs, claude retrying fix by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/278
* [Feat] Add support for evaluation of Oryx models by dongyh20 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/276
* [Fix] Fix the error when running models caused by `generate_until_multi_round` by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/281
* [fix] Refactor GeminiAPI class to add video pooling and freeing by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/287
* add jmmmu by AtsuMiyai in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/286
* [Feat] Add support for evaluation of InternVideo2-Chat && Fix evaluation for mvbench by yinanhe in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/280

New Contributors
* YangYangGirl made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/249
* white2018 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/257
* pbcong made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/262
* ZhaoyangLi-nju made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/265
* ngquangtrung57 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/270
* yfzhang114 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/266
* jkooy made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/269
* dongyh20 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/276
* yinanhe made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/280

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.3...v0.2.4

0.2.3.post1

What's Changed
* [Fix] Fix bugs in returning result dict and bring back anls metric by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/221
* fix: fix wrong args in wandb logger by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/226

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.3...v0.2.3.post1

0.2.3

What's Changed
* Update the blog link by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/196
* Bring back PR52 by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/198
* fix: update from previous model_specific_prompt to current lmms_eval_kwargs to avoid warnings by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/206
* [Feat] SGLang SRT commands in one go, async input for openai server by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/212
* [Minor] Add kill sglang process by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/213
* Support textonly inference for LLaVA-OneVision. by CaraJ7 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/215
* Fix `videomme` evaluation by zhijian-liu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/209
* [feat] remove registeration logic and adding language evaluation tasks. by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/218

New Contributors
* zhijian-liu made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/209

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.2...v0.2.3

0.2.2

What's Changed
* Include VCR by tianyu-z in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/105
* [Small Update] Update the version of LMMs-Eval by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/109
* add II-Bench by XinrunDu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/111
* Q-Bench, Q-Bench2, A-Bench by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/113
* LongVideoBench for LMMs-Eval by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/117
* Fix the potential risk by PR 117 by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/118
* add tinyllava by zjysteven in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/114
* Add docs for datasets upload to HF by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/120
* [Model] aligned llava-interleave model results on video tasks by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/125
* External package integration using plugins by lorenzomammana in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/126
* Add task VITATECS by lscpku in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/130
* add task gqa-ru by Dannoopsy in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/128
* add task MMBench-ru by Dannoopsy in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/129
* Add wild vision bench by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/133
* Add detailcaps by Dousia in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/136
* add MLVU task by shuyansy in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/137
* add process sync in evaluation metric computation via a temp file in lmms_eval/evaluator.py by Dousia in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/143
* [Sync Features] add vila, add wildvision, add vibe-eval, add interleave bench by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/138
* Add muirbench by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/147
* Add a new benchmark: MIRB by ys-zong in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/150
* Add LMMs-Lite by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/148
* [Docs] Fix broken hyperlink in README.md by abzb1 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/149
* Changes in llava_hf.py. Corrected the response split by role and added the ability to specify an EOS token by Dannoopsy in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/153
* Add default values for mm_resampler_location and mm_newline_position to make sure Llavavid model can run successfully. by choiszt in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/156
* Update README.md by kcz358 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/159
* revise llava_vid.py by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/164
* Add MMStar by skyil7 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/158
* Add model Mantis to the LMMs-Eval supported model list by baichuanzhou in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/162
* Fix utils.py by abzb1 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/165
* Add default prompt for seedbench_2.yaml by skyil7 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/167
* Fix a small typo for live_bench by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/169
* [New Model] Adding Cambrian Model by Nyandwi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/171
* Revert "[New Model] Adding Cambrian Model" by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/178
* Fixed some issues in InternVL family and ScienceQA task. by skyil7 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/174
* [Add Dataset] SEEDBench 2 Plus by abzb1 in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/180
* [New Updates] LLaVA OneVision Release; MVBench, InternVL2, IXC2.5 Interleave-Bench integration. by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/182
* New pypi by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/184

New Contributors
* tianyu-z made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/105
* XinrunDu made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/111
* teowu made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/113
* zjysteven made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/114
* lorenzomammana made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/126
* lscpku made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/130
* Dannoopsy made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/128
* Dousia made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/136
* shuyansy made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/137
* ys-zong made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/150
* abzb1 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/149
* choiszt made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/156
* skyil7 made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/158
* baichuanzhou made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/162
* Nyandwi made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/171

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.0...v0.2.2

0.2.0.post1

What's Changed
* Include VCR by tianyu-z in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/105
* [Small Update] Update the version of LMMs-Eval by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/109
* add II-Bench by XinrunDu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/111
* Q-Bench, Q-Bench2, A-Bench by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/113
* LongVideoBench for LMMs-Eval by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/117
* Fix the potential risk by PR 117 by teowu in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/118
* add tinyllava by zjysteven in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/114
* Add docs for datasets upload to HF by pufanyi in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/120
* [Model] aligned llava-interleave model results on video tasks by Luodian in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/125

New Contributors
* tianyu-z made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/105
* XinrunDu made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/111
* teowu made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/113
* zjysteven made their first contribution in https://github.com/EvolvingLMMs-Lab/lmms-eval/pull/114

**Full Changelog**: https://github.com/EvolvingLMMs-Lab/lmms-eval/compare/v0.2.0...v0.2.0.post1

Lmms-eval

Page 1 of 2