Feature
* feat: reduce logging for load_results()
- redacts missing subsets to avoid 100+ subsets printed
- reduce to logging.info
- removed splits that are commonly never evaluated on and thus also the errors for them being missing
The second part removed quite a few warnings (4930 to XX)
It also seems like the splits were accidentally included in some of the MMTEB benchmark.
This will remove those splits from those benchmarks (which are all in beta). We will have to recompute the tables for the paper though (we should do that anyway)
Other potential thing to consider:
- Scifact is included in MTEB(Medical). I have removed the &34;train&34; split from it as I think that was a mistake. (checked other dataset in benchmark)
Here is a count of the current top errors:
py
{
&34;MassiveScenarioClassification: Missing splits {&39;validation&39;}&34;: 238, included in e.g. mteb(fra)
&34;MassiveIntentClassification: Missing splits {&39;validation&39;}&34;: 237, included in e.g. mteb(fra)
&34;MassiveScenarioClassification: Missing subsets {&39;af&39;, &39;da&39;, ...} for split test&34;: 230,
&34;AmazonReviewsClassification: Missing splits {&39;validation&39;}&34;: 229, included in e.g. mteb(deu)
&34;MassiveIntentClassification: Missing subsets {&39;af&39;, &39;da&39;, ...} for split test&34;: 228,
&34;STS22: Missing subsets {&39;fr-pl&39;, &39;de-en&39;, ...} for split test&34;: 223,
&34;AmazonReviewsClassification: Missing subsets {&39;es&39;, &39;ja&39;, ...} for split test&34;: 196,
&34;MTOPDomainClassification: Missing splits {&39;validation&39;}&34;: 195, included in mteb(fra)
&34;MTOPIntentClassification: Missing splits {&39;validation&39;}&34;: 194, included in mteb(fra)
&34;AmazonCounterfactualClassification: Missing splits {&39;validation&39;}&34;: 189, included in mteb(deu)
&34;MTOPDomainClassification: Missing subsets {&39;es&39;, &39;th&39;, ...} for split test&34;: 165,
&34;STS17: Missing subsets {&39;en-ar&39;, &39;es-es&39;, ...} for split test&34;: 164,
&34;MTOPIntentClassification: Missing subsets {&39;es&39;, &39;th&39;, ...} for split test&34;: 164,
&34;AmazonCounterfactualClassification: Missing subsets {&39;de&39;, &39;ja&39;, ...} for split test&34;: 148,
}
([`7e16fa2`](https://github.com/embeddings-benchmark/mteb/commit/7e16fa2565b2058e12303a1feedbd0d4dea96a41))