Mteb

Latest version: v1.36.22

Safety actively analyzes 714815 Python packages for vulnerabilities to keep your Python projects secure.

Page 12 of 82

1.31.2

Fix

* fix: update voyage exp metadata (1888)

* fix: update voyage exp metadata

* aded number of parameters ([`e623771`](https://github.com/embeddings-benchmark/mteb/commit/e6237714a1e340a0e07a8f121030e0277a8d5634))

1.31.1

Fix

* fix: fix jina v1, 2 models (1872)

fix jina models ([`1d66089`](https://github.com/embeddings-benchmark/mteb/commit/1d660892288d02379e67a59b94523410497ee20b))

Unknown

* doc: update pr template (1871)

* doc: update pr template

* group testing & add: do not delete

---------

Co-authored-by: sam021313 <40773225+sam021313users.noreply.github.com> ([`95714d0`](https://github.com/embeddings-benchmark/mteb/commit/95714d06da6a06969e92526262d57f718e840c21))

1.31.0

Feature

* feat: add instruct wrapper (1768)

* add instruct wrapper

* use get_task_instruction

* add logging messages

* apply based on PromptType

* update description

* change example model

* move nvembed

* Update mteb/models/instruct_wrapper.py

Co-authored-by: Isaac Chung <chungisaac1217gmail.com>

* update docstrings

* add instruction to docs

* Apply suggestions from code review

Co-authored-by: Isaac Chung <chungisaac1217gmail.com>

* lint

---------

Co-authored-by: Isaac Chung <chungisaac1217gmail.com> ([`ee0f15a`](https://github.com/embeddings-benchmark/mteb/commit/ee0f15ad03313d3a030c6f21ae6aafd9bc95bbb0))

1.30.0

Feature

* feat: Integrating ChemTEB (1708)

* Add SMILES, AI Paraphrase and Inter-Source Paragraphs PairClassification Tasks

* Add chemical subsets of NQ and HotpotQA datasets as Retrieval tasks

* Add PubChem Synonyms PairClassification task

* Update task __init__ for previously added tasks

* Add nomic-bert loader

* Add a script to run the evaluation pipeline for chemical-related tasks

* Add 15 Wikipedia article classification tasks

* Add PairClassification and BitextMining tasks for Coconut SMILES

* Fix naming of some Classification and PairClassification tasks

* Fix some classification tasks naming issues

* Integrate WANDB with benchmarking script

* Update .gitignore

* Fix `nomic_models.py` issue with retrieval tasks, similar to issue 1115 in original repo

* Add one chemical model and some SentenceTransformer models

* Fix a naming issue for SentenceTransformer models

* Add OpenAI, bge-m3 and matscibert models

* Add PubChem SMILES Bitext Mining tasks

* Change metric namings to be more descriptive

* Add English e5 and bge v1 models, all the sizes

* Add two Wikipedia Clustering tasks

* Add a try-except in evaluation script to skip faulty models during the benchmark.

* Add bge v1.5 models and clustering score extraction to json parser

* Add Amazon Titan embedding models

* Add Cohere Bedrock models

* Add two SDS Classification tasks

* Add SDS Classification tasks to classification init and chem_eval

* Add a retrieval dataset, update dataset names and revisions

* Update revision for the CoconutRetrieval dataset: handle duplicate SMILES (documents)

* Update `CoconutSMILES2FormulaPC` task

* Change CoconutRetrieval dataset to a smaller one

* Update some models
- Integrate models added in ChemTEB (such as amazon, cohere bedrock and nomic bert) with latest modeling format in mteb.
- Update the metadata for the mentioned models

* Fix a typo
`open_weights` argument is repeated twice

* Update ChemTEB tasks
- Rename some tasks for better readability.
- Merge some BitextMining and PairClassification tasks into a single task with subsets (`PubChemSMILESBitextMining` and `PubChemSMILESPC`)
- Add a new multilingual task (`PubChemWikiPairClassification`) consisting of 12 languages.
- Update dataset paths, revisions and metadata for most tasks.
- Add a `Chemistry` domain to `TaskMetadata`

* Remove unnecessary files and tasks for MTEB

* Update some ChemTEB tasks
- Move `PubChemSMILESBitextMining` to `eng` folder
- Add citations for tasks involving SDS, NQ, Hotpot, PubChem data
- Update Clustering tasks `category`
- Change `main_score` for `PubChemAISentenceParaphrasePC`

* Create ChemTEB benchmark

* Remove `CoconutRetrieval`

* Update tasks and benchmarks tables with ChemTEB

* Mention ChemTEB in readme

* Fix some issues, update task metadata, lint
- `eval_langs` fixed
- Dataset path was fixed for two datasets
- Metadata was completed for all tasks, mainly following fields: `date`, `task_subtypes`, `dialect`, `sample_creation`
- ruff lint
- rename `nomic_bert_models.py` to `nomic_bert_model.py` and update it.

* Remove `nomic_bert_model.py` as it is now compatible with SentenceTransformer.

* Remove `WikipediaAIParagraphsParaphrasePC` task due to being trivial.

* Merge `amazon_models` and `cohere_bedrock_models.py` into `bedrock_models.py`

* Remove unnecessary `load_data` for some tasks.

* Update `bedrock_models.py`, `openai_models.py` and two dataset revisions
- Text should be truncated for amazon text embedding models.
- `text-embedding-ada-002` returns null embeddings for some inputs with 8192 tokens.
- Two datasets are updated, dropping very long samples (len > 99th percentile)

* Add a layer of dynamic truncation for amazon models in `bedrock_models.py`

* Replace `metadata_dict` with `self.metadata` in `PubChemSMILESPC.py`

* fix model meta for bedrock models

* Add reference comment to original Cohere API implementation ([`4d66434`](https://github.com/embeddings-benchmark/mteb/commit/4d66434c80050ace3b927f3fc1829b8dd377f78a))

Unknown

* Update points table ([`223bf32`](https://github.com/embeddings-benchmark/mteb/commit/223bf324c213f222785bbf2db88e30c8069c610b))

1.29.16

Fix

* fix: Added correct training data annotation to LENS (1859)

Added correct training data annotation to LENS ([`e775436`](https://github.com/embeddings-benchmark/mteb/commit/e77543694ae16716c4420dd0b79c0d9f33a938db))

1.29.15

Fix

* fix: Adding missing model meta (1856)

* Added CDE models

* Added bge-en-icl

* Updated CDE to bge_full_data

* Fixed public_training_data flag type to include boolean, as this is how all models are annotated

* Added public training data link instead of bool to CDE and BGE

* Added GME models

* Changed Torch to PyTorch

* Added metadata on LENS models

* Added ember_v1

* Added metadata for amazon titan

* Removed GME implementation ([`692bd26`](https://github.com/embeddings-benchmark/mteb/commit/692bd265e731c934d8318c497b954e271540a6ab))

Page 12 of 82

Releases

Has known vulnerabilities

Previous Next

Mteb

Page 12 of 82

1.31.2

1.31.1

1.31.0

1.30.0

1.29.16

1.29.15

Page 12 of 82

Links

Releases