Ci
* ci: fix model loading test (1775)
* pass base branch into the make command as an arg
* test a file that has custom wrapper
* what about overview
* just dont check overview
* revert instance check
* explicitly omit overview and init
* remove test change
* try on a lot of models
* revert test model file
---------
Co-authored-by: Isaac Chung <isaac.chungteam.wrike.com> ([`9b117a8`](https://github.com/embeddings-benchmark/mteb/commit/9b117a8245a8c56470d99b8ca3d6b2f6b6819dd8))
Feature
* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (1787)
* feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &34;STS22&34;][0]
was eq. to:
task = mteb.get_task(&34;STS22&34;, languages=[&34;eng&34;])
task.hf_subsets
correct filtering to English datasets:
[&39;en&39;, &39;de-en&39;, &39;es-en&39;, &39;pl-en&39;, &39;zh-en&39;]
However it should be:
[&39;en&39;]
with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &34;STS22&34;][0]
task.hf_subsets
[&39;en&39;]
eq. to
task = mteb.get_task(&34;STS22&34;, hf_subsets=[&34;en&34;])
which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&34;STS22&34;, languages=[&34;eng&34;], exclusive_language_filter=True)
* format
* remove &34;en-ext&34; from AmazonCounterfactualClassification
* fixed mteb(deu)
* fix: simplify in a few areas ([`4a70e5d`](https://github.com/embeddings-benchmark/mteb/commit/4a70e5d8996a341097c81782b463b1822f9708fe))