New Metric System
- datasets package Metric implementation is adopted (and extended) to provide high performance 💯 and more unified interface 🤗.
- Custom metric implementation changed accordingly (it now requires 3 abstract methods to be implemented).
- Jury class is now callable (implements __call__() method to be used thoroughly) though evaluate() method is still available for backward compatibility.
- In the usage of evaluate of Jury, `predictions` and `references` parameters are restricted to be passed as keyword arguments to prevent confusion/wrong computations (like datasets' metrics).
- MetricCollator is removed, the methods for metrics are attached directly to Jury class. Now, metric addition and removal can be performed from a Jury instance directly.
- Jury now supports reading metrics from string, list and dictionaries. It is more generic to input type of metrics given along with parameters.
New metrics
- Accuracy, F1, Precision, Recall are added to Jury metrics.
- All metrics on datasets package are still available on jury through the use of `jury.load_metric()`
Development
- Test cases are improved with fixtures, and test structure is enchanced.
- Expected outputs are now required for tests as a json with proper name.