Fix
* fix: Remove default params, `public_training_data` and `memory usage` in `ModelMeta` (1794)
* fix: Leaderboard: `K` instead of `M`
Fixes 1752
* format
* fixed existing annotations to refer to task name instead of hf dataset
* added annotation to nvidia
* added voyage
* added uae annotations
* Added stella annotations
* sentence trf models
* added salesforce and e5
* jina
* bge + model2vec
* added llm2vec annotations
* add jasper
* format
* format
* Updated annotations and moved jina models
* make models parameters needed to be filled
* fix tests
* remove comments
* remove model meta from test
* fix model meta from split
* fix: add even more training dataset annotations (1793)
* fix: update max tokens for OpenAI (1772)
update max tokens
* ci: skip AfriSentiLID for now (1785)
* skip AfriSentiLID for now
* skip relevant test case instead
---------
Co-authored-by: Isaac Chung <isaac.chungteam.wrike.com>
* 1.28.7
Automatically generated by python-semantic-release
* ci: fix model loading test (1775)
* pass base branch into the make command as an arg
* test a file that has custom wrapper
* what about overview
* just dont check overview
* revert instance check
* explicitly omit overview and init
* remove test change
* try on a lot of models
* revert test model file
---------
Co-authored-by: Isaac Chung <isaac.chungteam.wrike.com>
* feat: Update task filtering, fixing bug which included cross-lingual tasks in overly many benchmarks (1787)
* feat: Update task filtering, fixing bug on MTEB
- Updated task filtering adding exclusive_language_filter and hf_subset
- fix bug in MTEB where cross-lingual splits were included
- added missing language filtering to MTEB(europe, beta) and MTEB(indic, beta)
The following code outlines the problems:
py
import mteb
from mteb.benchmarks import MTEB_ENG_CLASSIC
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &34;STS22&34;][0]
was eq. to:
task = mteb.get_task(&34;STS22&34;, languages=[&34;eng&34;])
task.hf_subsets
correct filtering to English datasets:
[&39;en&39;, &39;de-en&39;, &39;es-en&39;, &39;pl-en&39;, &39;zh-en&39;]
However it should be:
[&39;en&39;]
with the changes it is:
task = [t for t in MTEB_ENG_CLASSIC.tasks if t.metadata.name == &34;STS22&34;][0]
task.hf_subsets
[&39;en&39;]
eq. to
task = mteb.get_task(&34;STS22&34;, hf_subsets=[&34;en&34;])
which you can also obtain using the exclusive_language_filter (though not if there was multiple english splits):
task = mteb.get_task(&34;STS22&34;, languages=[&34;eng&34;], exclusive_language_filter=True)
* format
* remove &34;en-ext&34; from AmazonCounterfactualClassification
* fixed mteb(deu)
* fix: simplify in a few areas
* fix: Add gritlm
* 1.29.0
Automatically generated by python-semantic-release
* fix: Added more annotations!
* fix: Added C-MTEB (1786)
Added C-MTEB
* 1.29.1
Automatically generated by python-semantic-release
* docs: Add contact to MMTEB benchmarks (1796)
* Add myself to MMTEB benchmarks
* lint
* fix: loading pre 11 (1798)
* fix loading pre 11
* add similarity
* lint
* run all task types
* 1.29.2
Automatically generated by python-semantic-release
* fix: allow to load no revision available (1801)
* fix allow to load no revision available
* lint
* add require_model_meta to leaderboard
* lint
* 1.29.3
Automatically generated by python-semantic-release
---------
Co-authored-by: Roman Solomatin <samoed.romangmail.com>
Co-authored-by: Isaac Chung <chungisaac1217gmail.com>
Co-authored-by: Isaac Chung <isaac.chungteam.wrike.com>
Co-authored-by: github-actions <github-actionsgithub.com>
Co-authored-by: Márton Kardos <power.up1163gmail.com>
* fig merges
* update models info
* change public_training_code to str
* change `public_training_code=False` to None
* remove annotations
* remove annotations
* remove changed annotations
* remove changed annotations
* remove `public_training_data` and `memory usage`
* make framework not optional
* make framework non-optional
* empty frameworks
* add framework
* fix tests
* Update mteb/models/overview.py
Co-authored-by: Isaac Chung <chungisaac1217gmail.com>
---------
Co-authored-by: Kenneth Enevoldsen <kennethcenevoldsengmail.com>
Co-authored-by: Isaac Chung <chungisaac1217gmail.com>
Co-authored-by: Isaac Chung <isaac.chungteam.wrike.com>
Co-authored-by: github-actions <github-actionsgithub.com>
Co-authored-by: Márton Kardos <power.up1163gmail.com> ([`0a83e38`](https://github.com/embeddings-benchmark/mteb/commit/0a83e383efe41e86e51c0d4cdca18d9ed5d42821))
* fix: subsets to run (1830)
* fix split evals
* add test
* lint
* fix moka
* add assert ([`8be6b2e`](https://github.com/embeddings-benchmark/mteb/commit/8be6b2e36abb005822e07c034484c245345f6eb2))