Lighteval

Latest version: v0.6.2

Safety actively analyzes 688587 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.5.0

What's new

Features
* Tokenization-wise encoding by hynky1999 in 287
* Task config by hynky1999 in 289

Bug fixes
* Fixes bug: You can't create a model without either a list of model_args or a model_config_path when model_config_path was submited by NathanHB in 298
* skip tests if secrets not provided by hynky1999 in 304
* [FIX] vllm backend by NathanHB in 317

0.4.0

What's new

Features
* Adds vlmm as backend for insane speed up by NathanHB in 274
* Add llm_as_judge in metrics (using both OpenAI or Transformers) by NathanHB in 146
* Abale to use config files for models by clefourrier in 131
* List available tasks in the cli `lighteval tasks --list` by DimbyTa in 142
* Use torch compile for speed up by clefourrier in 248
* Add majk metric by clefourrier in 158
* Adds a dummy/random model for baseline init by guipenedo in 220
* lighteval is now a cli tool: `lighteval --args` by NathanHB in 152
* We can now log info from the metrics (for example input and response from llm_as_judge) by NathanHB in 157
* Configurable task versioning by PhilipMay in 181
* Programmatic interface by clefourrier in 269
* Probability Metric + New Normalization by hynky1999 in 276
* Add widgets to the README by clefourrier in 145

New tasks
* Add `Ger-RAG-eval`tasks. by PhilipMay in 149
* adding `aimo` custom eval by NathanHB in 154

Fixes
* Bump nltlk to 3.9.1 to fix security issue by NathanHB in 137
* Fix max_length type when being passed in model args by csarron in 138
* Fix nanotron models input size bug by clefourrier in 156
* Fix MATH normalization by lewtun in 162
* fix Prompt function names by clefourrier in 168
* Fix prompt format german rag community task by jphme in 171
* add 'cite as' section in readme by NathanHB in 178
* Fix broken link to extended tasks in README by alexrs in 182
* Mention HF_TOKEN in readme by Wauplin in 194
* Download BERT scorer lazily by sadra-barikbin in 190
* Updated tgi_model and added parameters for endpoint_model by shaltielshmid in 208
* fix llm as judge warnings by NathanHB in 173
* ADD GPT-4 as Judge by philschmid in 206
* Fix a few typos and do a tiny refactor by sadra-barikbin in 187
* Avoid truncating the outputs based on string lengths by anton-l in 201
* Now only uses functions for prompt definition by clefourrier in 213
* Data split depending on eval params by clefourrier in 169
* should fix most inference endpoints issues of version config by clefourrier in 226
* Fix _init_max_length in base_model.py by gucci-j in 185
* Make evaluator invariant of input request type order by sadra-barikbin in 215
* Fixing issues with multichoice_continuations_start_space - was not parsed properly by clefourrier in 232
* Fix IFEval metric by lewtun in 259
* change priority when choosing model dtype by NathanHB in 263
* Add grammar option to generation by sadra-barikbin in 242
* make info loggers dataclass, so that their properties have expected lifetime by hynky1999 in 280
* Remove expensive prediction run during test collection by hynky1999 in 279
* Example Configs and Docs by RohitMidha23 in 255
* Refactoring the few shot management by clefourrier in 272
* Standalone nanotron config by hynky1999 in 285
* Logging Revamp by hynky1999 in 284
* bump nltk version by NathanHB in 290

Significant community contributions

The following contributors have made significant changes to the library over the last release:

* NathanHB
* commit (137)
* Add llm as judge in metrics (146)
* Nathan add logging to metrics (157)
* add 'cite as' section in readme (178)
* Fix citation section in readme (180)
* adding aimo custom eval (154)
* fix llm as judge warnings (173)
* launch lighteval using `lighteval --args` (152)
* adds llm as judge using transformers (223)
* Fix missing json file (264)
* change priority when choosing model dtype (263)
* fix the location of tasks list in the readme (267)
* updates ifeval repo (268)
* fix nanotron (283)
* add vlmm backend (274)
* bump nltk version (290)
* clefourrier
* Add config files for models (131)
* Add fun widgets to the README (145)
* Fix nanotron models input size bug (156)
* no function we actually use should be named prompt_fn (168)
* Add majk metric (158)
* Homogeneize logging system (150)
* Use only dataclasses for task init (212)
* Now only uses functions for prompt definition (213)
* Data split depending on eval params (169)
* should fix most inference endpoints issues of version config (226)
* Add metrics as functions (214)
* Quantization related issues (224)
* Update issue templates (235)
* remove latex writer since we don't use it (231)
* Removes default bert scorer init (234)
* fix (233)
* udpated piqa (222)
* uses torch compile if provided (248)
* Fix inference endpoint config (244)
* Expose samples via the CLI (228)
* Fixing issues with multichoice_continuations_start_space - was not parsed properly (232)
* Programmatic interface + cleaner management of requests (269)
* Small file reorg (only renames/moves) (271)
* Refactoring the few shot management (272)
* PhilipMay
* Add `Ger-RAG-eval`tasks. (149)
* Add version config option. (181)
* shaltielshmid
* Added Namespace parameter for InferenceEndpoints, added option for passing model config directly (147)
* Updated tgi_model and added parameters for endpoint_model (208)
* hynky1999
* make info loggers dataclass, so that their properties have expected lifetime (280)
* Remove expensive prediction run during test collection (279)
* Probability Metric + New Normalization (276)
* Standalone nanotron config (285)
* Logging Revamp (284)

0.3.0

Not secure
Release Note

This introduced the new extended tasks feature, documentation and many other patches for improved stability.
New tasks are also introduced:
- Big Bench Hard: https://huggingface.co/papers/2210.09261
- AGIEval: https://huggingface.co/papers/2304.06364
- TinyBench:
- MT Bench: https://huggingface.co/papers/2306.05685
- AlGhafa Benchmarking Suite: https://aclanthology.org/2023.arabicnlp-1.21/

MT-Bench marks the introduction of multi-turn prompting as well as llm-as-a-judge metric.

New tasks
* Add BBH by clefourrier in https://github.com/huggingface/lighteval/pull/7, bilgehanertan in https://github.com/huggingface/lighteval/pull/126
* Add AGIEval by clefourrier in https://github.com/huggingface/lighteval/pull/121
* Adding TinyBench by clefourrier in https://github.com/huggingface/lighteval/pull/104
* Adding support for Arabic benchmarks : AlGhafa benchmarking suite by alielfilali01 in https://github.com/huggingface/lighteval/pull/95
* Add mt-bench by NathanHB in https://github.com/huggingface/lighteval/pull/75

Features
* Extended Tasks ! by clefourrier in https://github.com/huggingface/lighteval/pull/101, lewtun in https://github.com/huggingface/lighteval/pull/108, NathanHB in https://github.com/huggingface/lighteval/pull/122, https://github.com/huggingface/lighteval/pull/123
* Added support for launching inference endpoint with different model dtypes by shaltielshmid in https://github.com/huggingface/lighteval/pull/124

Documentation
* Adding LICENSE by clefourrier in https://github.com/huggingface/lighteval/pull/86, NathanHB in https://github.com/huggingface/lighteval/pull/89
* Make it clearer in the README that the leaderboard uses the harness by clefourrier in https://github.com/huggingface/lighteval/pull/94

Small patches
* Update huggingface-hub for compatibility with datasets 2.18 by clefourrier in https://github.com/huggingface/lighteval/pull/84
* Tidy up dependency groups by lewtun in https://github.com/huggingface/lighteval/pull/81
* bump git python by NathanHB in https://github.com/huggingface/lighteval/pull/90
* Sets a max length for the MATH task by clefourrier in https://github.com/huggingface/lighteval/pull/83
* Fix parallel data processing bug by clefourrier in https://github.com/huggingface/lighteval/pull/92
* Change the eos condition for GSM8K by clefourrier in https://github.com/huggingface/lighteval/pull/85
* Fixing rolling loglikelihood management by clefourrier in https://github.com/huggingface/lighteval/pull/78
* Fixes input length management for generative evals by clefourrier in https://github.com/huggingface/lighteval/pull/103
* Reorder addition of instruction in chat template by clefourrier in https://github.com/huggingface/lighteval/pull/111
* Ensure chat models terminate generation with EOS token by lewtun in https://github.com/huggingface/lighteval/pull/115
* Fix push details to hub by NathanHB in https://github.com/huggingface/lighteval/pull/98
* Small fixes to InferenceEndpointModel by shaltielshmid in https://github.com/huggingface/lighteval/pull/112
* Fix import typo autogptq by clefourrier in https://github.com/huggingface/lighteval/pull/116
* Fixed the loglikelihood method in inference endpoints models by clefourrier in https://github.com/huggingface/lighteval/pull/119
* Fix TextGenerationResponse import from hfh by Wauplin in https://github.com/huggingface/lighteval/pull/129
* Do not use deprecated list_files_info by Wauplin in https://github.com/huggingface/lighteval/pull/133
* Update test workflow name to 'Tests' by Wauplin in https://github.com/huggingface/lighteval/pull/134

New Contributors
* shaltielshmid made their first contribution in https://github.com/huggingface/lighteval/pull/112
* bilgehanertan made their first contribution in https://github.com/huggingface/lighteval/pull/126
* Wauplin made their first contribution in https://github.com/huggingface/lighteval/pull/129

**Full Changelog**: https://github.com/huggingface/lighteval/compare/v0.2.0...v0.3.0

0.2.0

Not secure
Release Note

This release focuses on customization and personalisation: it's now possible to define custom metrics, not just custom tasks, see the README for the full mechanism.
Also includes small fixes to improve stability and new tasks. We made the choice to split community tasks from the main library source to better manage maintenance.

Better community task handling
* New mechanism for evaluation contributions by clefourrier in https://github.com/huggingface/lighteval/pull/47
* Adding the custom metrics system by clefourrier in https://github.com/huggingface/lighteval/pull/65

New tasks
* Add GPQA by clefourrier in https://github.com/huggingface/lighteval/pull/42
* Adding support for Arabic benchmarks : AceGPT benchmarking suite by alielfilali01 in https://github.com/huggingface/lighteval/pull/44
* IFEval by clefourrier in https://github.com/huggingface/lighteval/pull/48

Features
* Add an automatic system to compute average for tasks with subtasks by clefourrier in https://github.com/huggingface/lighteval/pull/41

small patches
* Typos https://github.com/huggingface/lighteval/pull/27, https://github.com/huggingface/lighteval/pull/28, https://github.com/huggingface/lighteval/pull/30, https://github.com/huggingface/lighteval/pull/29, https://github.com/huggingface/lighteval/pull/34,
* Better README https://github.com/huggingface/lighteval/pull/26, https://github.com/huggingface/lighteval/pull/37, https://github.com/huggingface/lighteval/pull/55,
* Patch fix to match with config update/simplification in nanotron by thomwolf in https://github.com/huggingface/lighteval/pull/35
* bump transformers to 4.38 by NathanHB in https://github.com/huggingface/lighteval/pull/46
* Small fix to be able to use extensions of nanotron configs by thomwolf in https://github.com/huggingface/lighteval/pull/58
* Remove the eos token override in the Default Config Task by clefourrier in https://github.com/huggingface/lighteval/pull/54
* Update leaderboard task set by lewtun in https://github.com/huggingface/lighteval/pull/60
* Remove the eos token override in the Default Config Task by clefourrier in https://github.com/huggingface/lighteval/pull/54
* Fixes wikitext prompts + some patches on tg models by clefourrier in https://github.com/huggingface/lighteval/pull/64
* Fix unset generation size by clefourrier in https://github.com/huggingface/lighteval/pull/76
* Update ruff by clefourrier in https://github.com/huggingface/lighteval/pull/71
* Relax sentencepiece version by lewtun in https://github.com/huggingface/lighteval/pull/74
* Better chat template system by clefourrier in https://github.com/huggingface/lighteval/pull/38

:sparkles: Community Contributions
* ledrui made their first contribution in https://github.com/huggingface/lighteval/pull/26
* alielfilali01 made their first contribution in https://github.com/huggingface/lighteval/pull/44
* lewtun made their first contribution in https://github.com/huggingface/lighteval/pull/55

**Full Changelog**: https://github.com/huggingface/lighteval/compare/v0.1.1...v0.2.0

0.1.1

Not secure
Small patch for PyPi release

Include tasks_table.jsonl in package

0.1.0

Not secure
Init

LightEval 🌤️
A lightweight LLM evaluation

Context
LightEval is a lightweight LLM evaluation suite that Hugging Face has been using internally with the recently released LLM data processing library [datatrove](https://github.com/huggingface/datatrove) and LLM training library [nanotron](https://github.com/huggingface/nanotron).

We're releasing it with the community in the spirit of building in the open.

Note that it is still very much early so don't expect 100% stability ^^'
In case of problems or question, feel free to open an issue!

**Full Changelog**: https://github.com/huggingface/lighteval/commits/v0.1

Links

Releases

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.