Fix
* fix: Normalize benchmarks no only include task objects and added getter for benchmarks (1208)
* Normalize benchmarks to only include tasks
- Force benchmarks to only include tasks. This fixes a few bugs where benchmarks can reference a task which is not implemented
- implements `mteb.get_benchmark`, which makes it easier to fetch benchmarks
- Added tests + updated docs
A few outstanding issues:
I would like `mteb.MTEB(benchmark)` to always reproduce the benchmark. Currently this is not possible as MTEB(eng) required the split to be specified. A solution it to allow &34;eval_splits) to be specified when initializing a task and then pass it on to the `load_data()`. This way we can write the following:
`mteb.get_tasks(tasks=[...], eval_splits=[&34;test&34;], ...)`
I would also love the aggregation to be a part of the benchmark (such that it is clear how it should be aggregated). This is especially relevant for MTEB(eng) as it average the CQAD datasets before creating the global average. This way we can also create a result object for the benchmark itself. A complimenting solution for this is to allow nested benchmarks.
* fix error in tests
* format
* Added corrections based on review
* added example and formatted ([`f93154f`](https://github.com/embeddings-benchmark/mteb/commit/f93154f465b99bd9737b2ecfd54b3beb491a996d))