Safety vulnerability ID: 73055
The information on this page was manually curated by our Cybersecurity Intelligence Team.
Bump nltlk to version = "0.4.0.dev0" on the lighteval package due to security vulnerability.
Latest version: 0.6.2
A lightweight and configurable evaluation package
What's new
Features
* Adds vlmm as backend for insane speed up by NathanHB in 274
* Add llm_as_judge in metrics (using both OpenAI or Transformers) by NathanHB in 146
* Abale to use config files for models by clefourrier in 131
* List available tasks in the cli `lighteval tasks --list` by DimbyTa in 142
* Use torch compile for speed up by clefourrier in 248
* Add majk metric by clefourrier in 158
* Adds a dummy/random model for baseline init by guipenedo in 220
* lighteval is now a cli tool: `lighteval --args` by NathanHB in 152
* We can now log info from the metrics (for example input and response from llm_as_judge) by NathanHB in 157
* Configurable task versioning by PhilipMay in 181
* Programmatic interface by clefourrier in 269
* Probability Metric + New Normalization by hynky1999 in 276
* Add widgets to the README by clefourrier in 145
New tasks
* Add `Ger-RAG-eval`tasks. by PhilipMay in 149
* adding `aimo` custom eval by NathanHB in 154
Fixes
* Bump nltlk to 3.9.1 to fix security issue by NathanHB in 137
* Fix max_length type when being passed in model args by csarron in 138
* Fix nanotron models input size bug by clefourrier in 156
* Fix MATH normalization by lewtun in 162
* fix Prompt function names by clefourrier in 168
* Fix prompt format german rag community task by jphme in 171
* add 'cite as' section in readme by NathanHB in 178
* Fix broken link to extended tasks in README by alexrs in 182
* Mention HF_TOKEN in readme by Wauplin in 194
* Download BERT scorer lazily by sadra-barikbin in 190
* Updated tgi_model and added parameters for endpoint_model by shaltielshmid in 208
* fix llm as judge warnings by NathanHB in 173
* ADD GPT-4 as Judge by philschmid in 206
* Fix a few typos and do a tiny refactor by sadra-barikbin in 187
* Avoid truncating the outputs based on string lengths by anton-l in 201
* Now only uses functions for prompt definition by clefourrier in 213
* Data split depending on eval params by clefourrier in 169
* should fix most inference endpoints issues of version config by clefourrier in 226
* Fix _init_max_length in base_model.py by gucci-j in 185
* Make evaluator invariant of input request type order by sadra-barikbin in 215
* Fixing issues with multichoice_continuations_start_space - was not parsed properly by clefourrier in 232
* Fix IFEval metric by lewtun in 259
* change priority when choosing model dtype by NathanHB in 263
* Add grammar option to generation by sadra-barikbin in 242
* make info loggers dataclass, so that their properties have expected lifetime by hynky1999 in 280
* Remove expensive prediction run during test collection by hynky1999 in 279
* Example Configs and Docs by RohitMidha23 in 255
* Refactoring the few shot management by clefourrier in 272
* Standalone nanotron config by hynky1999 in 285
* Logging Revamp by hynky1999 in 284
* bump nltk version by NathanHB in 290
Significant community contributions
The following contributors have made significant changes to the library over the last release:
* NathanHB
* commit (137)
* Add llm as judge in metrics (146)
* Nathan add logging to metrics (157)
* add 'cite as' section in readme (178)
* Fix citation section in readme (180)
* adding aimo custom eval (154)
* fix llm as judge warnings (173)
* launch lighteval using `lighteval --args` (152)
* adds llm as judge using transformers (223)
* Fix missing json file (264)
* change priority when choosing model dtype (263)
* fix the location of tasks list in the readme (267)
* updates ifeval repo (268)
* fix nanotron (283)
* add vlmm backend (274)
* bump nltk version (290)
* clefourrier
* Add config files for models (131)
* Add fun widgets to the README (145)
* Fix nanotron models input size bug (156)
* no function we actually use should be named prompt_fn (168)
* Add majk metric (158)
* Homogeneize logging system (150)
* Use only dataclasses for task init (212)
* Now only uses functions for prompt definition (213)
* Data split depending on eval params (169)
* should fix most inference endpoints issues of version config (226)
* Add metrics as functions (214)
* Quantization related issues (224)
* Update issue templates (235)
* remove latex writer since we don't use it (231)
* Removes default bert scorer init (234)
* fix (233)
* udpated piqa (222)
* uses torch compile if provided (248)
* Fix inference endpoint config (244)
* Expose samples via the CLI (228)
* Fixing issues with multichoice_continuations_start_space - was not parsed properly (232)
* Programmatic interface + cleaner management of requests (269)
* Small file reorg (only renames/moves) (271)
* Refactoring the few shot management (272)
* PhilipMay
* Add `Ger-RAG-eval`tasks. (149)
* Add version config option. (181)
* shaltielshmid
* Added Namespace parameter for InferenceEndpoints, added option for passing model config directly (147)
* Updated tgi_model and added parameters for endpoint_model (208)
* hynky1999
* make info loggers dataclass, so that their properties have expected lifetime (280)
* Remove expensive prediction run during test collection (279)
* Probability Metric + New Normalization (276)
* Standalone nanotron config (285)
* Logging Revamp (284)
Scan your Python project for dependency vulnerabilities in two minutes
Scan your application