What's new
Tasks
* [LiveCodeBench](https://livecodebench.github.io/) by plaguss in #548, 587, 518
* [GPQA diamond](https://arxiv.org/abs/2311.12022) by lewtun in #534
* [Humanity's last exam](https://agi.safe.ai/) by clefourrier in #520
* [Olympiad Bench](https://github.com/OpenBMB/OlympiadBench) by NathanHB in #521
* [aime24, 25](https://aime25.aimedicine.info/) and [math500](https://huggingface.co/datasets/HuggingFaceH4/MATH-500) by NathanHB in #586
* french models Evals by mdiazmel in 505
Metrics
* Passk by clefourrier in 519
* Extractive Match metric by hynky1999 in 495, 503, 522, 535
Features
Better logging
* log model config by NathanHB in 627
* Support custom results/details push to hub by albertvillanova in 457
* Push details without converting fields to str by NathanHB in 572
Inference providers
* adds inference providers support by NathanHB in 616
Load details to be evaluated
* Implemented the possibility to load predictions from details files and continue evaluating from there by JoelNiklaus in 488
sglang support
* sglang by Jayon02 in 552
Bug Fixes and refacto
* Tiny improvements to `endpoint_model.py`, `base_model.py`,... by sadra-barikbin in 219
* Update README.md by NathanHB in 486
* Fix issue with encodings for together models. by JoelNiklaus in 483
* Made litellm judge backend more robust. by JoelNiklaus in 485
* Fix `T_co` import bug by gucci-j in 484
* fix README link by vxw3t8fhjsdkghvbdifuk in 500
* Fixed issue with o1 in litellm. by JoelNiklaus in 493
* Hotfix for litellm judge by JoelNiklaus in 490
* Made judge response processing more robust. by JoelNiklaus in 491
* VLLM: Allows for max tokens to be set in model config file by NathanHB in 547
* Bump up the latex2sympy2_extended version + more tests by hynky1999 in 510
* Fixed bug of import url_to_fs from fsspec by LoserCheems in 507)
* Fix Ukrainian indices and confirmation word by ayukh in 516
* Fix VLLM data-parallel by hynky1999 in 541
* relax spacy import to relax dep by clefourrier in 622
* vllm fix sampling params by NathanHB in 625
* relax deps for tgi by NathanHB in 626
* Bug fix extractive match by hynky1999 in 540
* Fix loading of vllm model from files by NathanHB in 533
* fix: broken URLs by deep-diver in 550
* typo(vllm): `gpu_memory_utilisation` typo by tpoisonooo in 553
* allows better flexibility for litellm endpoints by NathanHB in 549
* Translate task template to Catalan and Galician and fix typos by mariagrandury in 506
* Relax upper bound on torch by lewtun in 508
* Fix vLLM generation with sampling params by lewtun in 578
* Make BLEURT lazy by hynky1999 in 536
* Fixing backend error in main_sglang. by TankNee in 597
* VLLM + Math-Verify fixes by hynky1999 in 603
* raise exception when generation size is more than model length by NathanHB in 571
Thanks
Huge thanks to Hyneck, Lewis, Ben, Agustín, Elie and everyone helping and and giving feedback 💙
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* hynky1999
* Extractive Match metric (495)
* Fix math extraction (503)
* Bump up the latex2sympy2_extended version + more tests (510)
* Math extraction - allow only trying the first match, more customizable latex extraction + bump deps (522)
* add missing inits (524)
* Sync Math-verify (535)
* Make BLEURT lazy (536)
* Bug fix extractive match (540)
* Fix VLLM data-parallel (541)
* VLLM + Math-Verify fixes (603)
* plaguss
* Add extended task for LiveCodeBench codegeneration (548)
* Add subsets for lcb (587)
* Jayon02
* Let lighteval support sglang (552)
* NathanHB
* adds olympiad bench (521)
* Fix loading of vllm model from files (533)
* [VLLM] Allows for max tokens to be set in model config file (547)
* allows better flexibility for litellm endpoints (549)
* raise exception when generation size is more than model length (571)
* Push details without converting fields to str (572)
* adds aime24, 25 and math500 (586)
* adds inference providers support (616)
* vllm fix sampling params (625)
* relax deps for tgi (626)
* log model config (627)