Jury

Latest version: v2.3.1

Safety actively analyzes 638741 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

2.1.0

What's New 🚀

Tasks 📝

We added task based new metric system which allows us to evaluate different type of inputs rather than old system which could only evaluate from strings (generated text) for only language generation tasks. Hence, jury now is able to support broader set of metrics works with different types of input.

With this, on `jury.Jury` API, the consistency of set of tasks given is under control. Jury will raise an error if any pair of metrics are not consistent with each other in terms of task (evaluation input).

AutoMetric ✨

- AutoMetric is introduced as a main factory class for automatically loading metrics, as a side note `load_metric` is still available for backward compatibility and is preferred (it uses AutoMetric under the hood).
- Tasks are now distinguished within metrics. For example, precision can be used for `language-generation` or `sequence-classification` task, where one evaluates from string (generated text) while other one evaluates from integers (class labels).
- On configuration file, metrics can be now stated with HuggingFace's datasets' metrics initializiation parameters. The keyword arguments for metrics that are used on computation are now separated in `"compute_kwargs"` key.


**Full Changelog**: https://github.com/obss/jury/compare/2.0.0...2.1.0

2.0.0

New Metric System

- datasets package Metric implementation is adopted (and extended) to provide high performance 💯 and more unified interface 🤗.
- Custom metric implementation changed accordingly (it now requires 3 abstract methods to be implemented).
- Jury class is now callable (implements __call__() method to be used thoroughly) though evaluate() method is still available for backward compatibility.
- In the usage of evaluate of Jury, `predictions` and `references` parameters are restricted to be passed as keyword arguments to prevent confusion/wrong computations (like datasets' metrics).
- MetricCollator is removed, the methods for metrics are attached directly to Jury class. Now, metric addition and removal can be performed from a Jury instance directly.
- Jury now supports reading metrics from string, list and dictionaries. It is more generic to input type of metrics given along with parameters.

New metrics

- Accuracy, F1, Precision, Recall are added to Jury metrics.
- All metrics on datasets package are still available on jury through the use of `jury.load_metric()`

Development

- Test cases are improved with fixtures, and test structure is enchanced.
- Expected outputs are now required for tests as a json with proper name.

1.1.2

- SQuAD bug fixed for evaluating with multiple references.
- Test design & cases revised with fixtures (improvement).

1.1.1

- Malfunctioning multiple prediction calculation caused by multiple reference input for BLEU and SacreBLEU is fixed.
- CLI Implementation is completed. 🎉

1.0.1

- Fix for nltk version (Colab is fixed as well).

1.0.0

Release Notes

- New metric structure is completed.
- Custom metric support is improved and no longer required to extend `datasets.Metric`, rather uses `jury.metrics.Metric`.
- Metric usage is unified with `compute`, `preprocess` and `postprocess` functions, which the only required implementation for custom metric is `compute`.
- Both string and `Metric` objects can be passed to `Jury(metrics=metrics)` now in a mixed fashion.
- `load_metric` function was rearranged to capture end score results and several metrics added accordingly (e.g. `load_metric("squad_f1")` will load squad metric which returns F1-score).
- Example notebook has added to example.
- MT and QA tasks were illustrated.
- Custom metric creation added as example.

Acknowledgments
fcakyon cemilcengiz devrimcavusoglu

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.