Evaluate

Latest version: v0.4.2

Safety actively analyzes 723625 Python packages for vulnerabilities to keep your Python projects secure.

Page 2 of 2

0.2.0

What's New

`evaluator`
The `evaluator` has been extended to three new tasks:
- `"image-classification"`
- `"token-classification"`
- `"question-answering"`

`combine`
With `combine` one can bundle several metrics into a single object that can be evaluated in one call and also used in combination with the `evalutor`.

What's Changed
* Fix typo in WER docs by pn11 in https://github.com/huggingface/evaluate/pull/147
* Fix rouge outputs by lvwerra in https://github.com/huggingface/evaluate/pull/158
* add tutorial for custom pipeline by lvwerra in https://github.com/huggingface/evaluate/pull/154
* refactor `evaluator` tests by lvwerra in https://github.com/huggingface/evaluate/pull/155
* rename `input_texts` to `predictions` in perplexity by lvwerra in https://github.com/huggingface/evaluate/pull/157
* Add link to GitHub author by lewtun in https://github.com/huggingface/evaluate/pull/166
* Add `combine` to compose multiple evaluations by lvwerra in https://github.com/huggingface/evaluate/pull/150
* test string casting only on first element by lvwerra in https://github.com/huggingface/evaluate/pull/159
* remove unused fixtures from unittests by lvwerra in https://github.com/huggingface/evaluate/pull/170
* Add a test to check that Evaluator evaluations match transformers examples by fxmarty in https://github.com/huggingface/evaluate/pull/163
* Add smaller model for `TextClassificationEvaluator` test by fxmarty in https://github.com/huggingface/evaluate/pull/172
* Add tags to spaces by lvwerra in https://github.com/huggingface/evaluate/pull/162
* Rename evaluation modules by lvwerra in https://github.com/huggingface/evaluate/pull/160
* Update push_evaluations_to_hub.py by lvwerra in https://github.com/huggingface/evaluate/pull/174
* update evaluate dependency for spaces by lvwerra in https://github.com/huggingface/evaluate/pull/175
* Add `ImageClassificationEvaluator` by fxmarty in https://github.com/huggingface/evaluate/pull/173
* attempting to let meteor handle multiple references per prediction by sashavor in https://github.com/huggingface/evaluate/pull/164
* fixed duplicate calculation of spearmanr function in metrics wrapper. by benlipkin in https://github.com/huggingface/evaluate/pull/176
* forbid hyphens in template for module names by lvwerra in https://github.com/huggingface/evaluate/pull/177
* switch from Github to Hub module factory for canonical modules by lvwerra in https://github.com/huggingface/evaluate/pull/180
* Fix bertscore idf by lvwerra in https://github.com/huggingface/evaluate/pull/183
* refactor evaluator base and task classes by lvwerra in https://github.com/huggingface/evaluate/pull/185
* Avoid importing tensorflow when importing evaluate by NouamaneTazi in https://github.com/huggingface/evaluate/pull/135
* Add QuestionAnsweringEvaluator by fxmarty in https://github.com/huggingface/evaluate/pull/179
* Evaluator perf by ola13 in https://github.com/huggingface/evaluate/pull/178
* Fix QuestionAnsweringEvaluator for squad v2, fix examples by fxmarty in https://github.com/huggingface/evaluate/pull/190
* Rename perf metric evaluator by lvwerra in https://github.com/huggingface/evaluate/pull/191
* Fix typos in QA Evaluator by lewtun in https://github.com/huggingface/evaluate/pull/192
* Evaluator device placement by lvwerra in https://github.com/huggingface/evaluate/pull/193
* Change test command in installation.mdx to use exact_match by mathemakitten in https://github.com/huggingface/evaluate/pull/194
* Add `TokenClassificationEvaluator` by fxmarty in https://github.com/huggingface/evaluate/pull/167
* Pin rouge_score by albertvillanova in https://github.com/huggingface/evaluate/pull/197
* add poseval by lvwerra in https://github.com/huggingface/evaluate/pull/195
* Combine docs by lvwerra in https://github.com/huggingface/evaluate/pull/201
* Evaluator column loading by lvwerra in https://github.com/huggingface/evaluate/pull/200
* Evaluator documentation by lvwerra in https://github.com/huggingface/evaluate/pull/199

New Contributors
* pn11 made their first contribution in https://github.com/huggingface/evaluate/pull/147
* fxmarty made their first contribution in https://github.com/huggingface/evaluate/pull/163
* benlipkin made their first contribution in https://github.com/huggingface/evaluate/pull/176
* NouamaneTazi made their first contribution in https://github.com/huggingface/evaluate/pull/135
* mathemakitten made their first contribution in https://github.com/huggingface/evaluate/pull/194

**Full Changelog**: https://github.com/huggingface/evaluate/compare/v0.1.2...v0.2.0

0.1.2

What's Changed
* Fix trec sacrebleu by lvwerra in https://github.com/huggingface/evaluate/pull/130
* Add distilled version Cometihno by BramVanroy in https://github.com/huggingface/evaluate/pull/131
* fix: add yaml extension to github action for release by lvwerra in https://github.com/huggingface/evaluate/pull/133
* fix docs badge by lvwerra in https://github.com/huggingface/evaluate/pull/134
* fix cookiecutter path to repository by lvwerra in https://github.com/huggingface/evaluate/pull/139
* docs: make metric cards more prominent by lvwerra in https://github.com/huggingface/evaluate/pull/132
* Update README.md by sashavor in https://github.com/huggingface/evaluate/pull/145
* Fix datasets download imports by albertvillanova in https://github.com/huggingface/evaluate/pull/143

New Contributors
* BramVanroy made their first contribution in https://github.com/huggingface/evaluate/pull/131
* albertvillanova made their first contribution in https://github.com/huggingface/evaluate/pull/143

**Full Changelog**: https://github.com/huggingface/evaluate/compare/v0.1.1...v0.1.2

0.1.1

What's Changed
* Fix broken links by mishig25 in https://github.com/huggingface/evaluate/pull/92
* Fix readme by lvwerra in https://github.com/huggingface/evaluate/pull/98
* Fixing broken evaluate-measurement hub link by panwarnaveen9 in https://github.com/huggingface/evaluate/pull/102
* fix typo in autodoc by manueldeprada in https://github.com/huggingface/evaluate/pull/101
* fix typo by manueldeprada in https://github.com/huggingface/evaluate/pull/100
* FIX `pip install evaluate[evaluator]` by philschmid in https://github.com/huggingface/evaluate/pull/103
* fix description field in metric template readme by lvwerra in https://github.com/huggingface/evaluate/pull/122
* Add automatic pypi release for evaluate by osanseviero in https://github.com/huggingface/evaluate/pull/121
* Fix typos in Evaluator docstrings by lewtun in https://github.com/huggingface/evaluate/pull/124
* Fix spaces description in metadata by lvwerra in https://github.com/huggingface/evaluate/pull/123
* fix revision string if it is a python version by lvwerra in https://github.com/huggingface/evaluate/pull/129
* Use accuracy as default metric for text classification Evaluator by lewtun in https://github.com/huggingface/evaluate/pull/128
* bump `evaluate` dependency in spaces by lvwerra in https://github.com/huggingface/evaluate/pull/88

New Contributors
* panwarnaveen9 made their first contribution in https://github.com/huggingface/evaluate/pull/102
* manueldeprada made their first contribution in https://github.com/huggingface/evaluate/pull/101
* philschmid made their first contribution in https://github.com/huggingface/evaluate/pull/103
* osanseviero made their first contribution in https://github.com/huggingface/evaluate/pull/121
* lewtun made their first contribution in https://github.com/huggingface/evaluate/pull/124

**Full Changelog**: https://github.com/huggingface/evaluate/compare/v0.1.0...v0.1.1

0.1.0

Release notes

These are the release notes of the initial release of the Evaluate library.

Goals

Goals of the Evaluate library:

- reproducibility: reporting and reproducing results is easy
- ease-of-use: access to a wide range of evaluation tools with a unified interface
- diversity: provide wide range of evaluation tools with metrics, comparisons, and measurements
- multimodal: models and datasets of many modalities can be evaluated
- community-driven: anybody can add custom evaluations by hosting them on the Hugging Face Hub

Release overview:

- `evaluate.load()`: The `load()` function is the main entry point into evaluate and allows to load evaluation modules from a local folder, the evaluate repository, or the Hugging Face Hub. It downloads, caches, and loads the evaluation modules and returns an `evaluate.EvaluationModule`.
- `evaluate.save()`: With `save()` a user can save evaluation results in a JSON file. In addition to the results from `evaluate.EvaluationModule` it can save additional parameters and automatically saves the timestamp, git commit hash, library version as well as Python path. One can either provide a directory for the results, in which case file names are automatically created, or an explicit file name for the result.
- `evaluate.push_to_hub()`: The `push_to_hub` function allows to push the results of a model evaluation to the model card on the Hugging Face Hub. The model, dataset, and metric are specified such that they can be linked on the hub.
- `evaluate.EvaluationModule`: The `EvaluationModule` class is the baseclass for all evaluation modules. There are three module types: metrics (to evaluate models), comparisons (to compare models), and measurements (to analyze datasets). The inputs can be either added with `add` (single input) and `add_batch` (batch of inputs) followed by a final `compute` call to compute the scores or all inputs can be passed to `compute` directly. Under the hood, Apache Arrow stores and loads the input data to compute the scores.
- `evaluate.EvaluationModuleInfo`: The `EvaluationModule` class is used to store attributes:
- `description`: A short description of the evaluation module.
- `citation`: A BibTex string for citation when available.
- `features`: A `Features` object defining the input format. The inputs provided to `add`, `add_batch`, and `compute` are tested against these types and an error is thrown in case of a mismatch.
- `inputs_description`: This is equivalent to the modules docstring.
- `homepage`: The homepage of the module.
- `license`: The license of the module.
- `codebase_urls`: Link to the code behind the module.
- `reference_urls`: Additional reference URLs.
- `evaluate.evaluator`: The `evaluator` provides automated evaluation and only requires a model, dataset, metric, in contrast to the metrics in the `EvaluationModule` which require model predictions. It has three main components: a model wrapped in a pipeline, a dataset, and a metric, and it returns the computed evaluation scores. Besides the three main components, it may also require two mappings to align the columns in the dataset and the pipeline labels with the datasets labels. This is an experimental feature -- currently, only text classification is supported.
- `evaluate-cli`: The community can add custom metrics by adding the necessary module script to a Space on the Hugging Face Hub. The `evaluate-cli` is a tool that simplifies this process by creating the Space, populating a template, and pushing it to the Hub. It also provides instructions to customize the template and integrate custom logic.

Main contributors:

lvwerra , sashavor , NimaBoscarino , ola13 , osanseviero , lhoestq , lewtun , douwekiela

Page 2 of 2

Releases

Has known vulnerabilities

Evaluate

Page 2 of 2

0.2.0

0.1.2

0.1.1

0.1.0

Page 2 of 2

Links

Releases