Sentence-transformers

Latest version: v3.3.1

Safety actively analyzes 681874 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 10 of 23

6.87

3.3.1

pip install sentence-transformers[onnx-gpu]==3.3.1
pip install sentence-transformers[onnx]==3.3.1
pip install sentence-transformers[openvino]==3.3.1


Details
If you're loading model under this scenario:
* Your model is hosted on Hugging Face.
* Your model is private.
* You haven't set the `HF_TOKEN` environment variable via `huggingface-cli login` or some other approach.
* You're passing the `token` argument to `SentenceTransformer` to load the model.

Then you may have encountered a crash in v3.3.0. This should be resolved now.

All Changes
* [`docs`] Fix the prompt link to the training script by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3060
* [Fix] Resolve loading private Transformer model in version 3.3.0 by pesuchin in https://github.com/UKPLab/sentence-transformers/pull/3058

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.3.0...v3.3.1

3.3.0

pip install sentence-transformers[onnx-gpu]==3.3.0
pip install sentence-transformers[onnx]==3.3.0
pip install sentence-transformers[openvino]==3.3.0


OpenVINO int8 static quantization (https://github.com/UKPLab/sentence-transformers/pull/3025)
We introduce int8 static quantization using [OpenVINO](https://github.com/openvinotoolkit/openvino), a highly performant solution that outperforms all other current backends by a mile, at a minimal loss in performance. Here are the updated benchmarks:

<p align="center">
<img src="https://github.com/user-attachments/assets/96f96da5-b65e-4293-8b67-d47430aa5fae" width="50%" />
</p>

Quantizing directly to the Hugging Face Hub

python
from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model

1. Load a model with the OpenVINO backend
model = SentenceTransformer("all-MiniLM-L6-v2", backend="openvino")

2. Quantize the model to int8, push the model to https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
as a pull request:
export_static_quantized_openvino_model(
model,
quantization_config=None,
model_name_or_path="sentence-transformers/all-MiniLM-L6-v2",
push_to_hub=True,
create_pr=True,
)

You can immediately use the model, even before it's merged, by using the `revision` argument:
python
from sentence_transformers import SentenceTransformer

pull_request_nr = 2 TODO: Update this to the number of your pull request
model = SentenceTransformer(
"all-MiniLM-L6-v2",
backend="openvino",
model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
revision=f"refs/pr/{pull_request_nr}"
)

And once it's merged:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"all-MiniLM-L6-v2",
backend="openvino",
model_kwargs={"file_name": "openvino/openvino_model_qint8_quantized.xml"},
)


Quantizing locally
You can also quantize a model and save it locally:
python
from sentence_transformers import SentenceTransformer, export_static_quantized_openvino_model
from optimum.intel import OVQuantizationConfig

model = SentenceTransformer("all-mpnet-base-v2", backend="openvino")
model.save_pretrained("path/to/all-mpnet-base-v2-local")
quantization_config = OVQuantizationConfig() <- You can update settings here
export_static_quantized_openvino_model(model, quantization_config, "path/to/all-mpnet-base-v2-local")

And after quantizing, you can load it like so:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"path/to/all-mpnet-base-v2-local",
backend="openvino",
model_kwargs={"file_name": "openvino_model_qint8_quantized.xml"},
)


All [original Sentence Transformer models](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) already have these new ` openvino_model_qint8_quantized.xml` files, so you can load them without exporting directly! I would recommend making pull requests for [other models on Hugging Face](https://huggingface.co/models?library=sentence-transformers) that you'd like to see quantized.

Learn more about how to Speed up Inference in the documentation: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

Training with Prompts (https://github.com/UKPLab/sentence-transformers/pull/2964)
Many modern embedding models are trained with โ€œinstructionsโ€ or โ€œpromptsโ€ following the [INSTRUCTOR paper](https://arxiv.org/abs/2212.09741). These prompts are strings, prefixed to each text to be embedded, allowing the model to distinguish between different types of text.

For example, the [mixedbread-ai/mxbai-embed-large-v1](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1) model was trained with Represent this sentence for searching relevant passages: as the prompt for all queries. This prompt is stored in the [model configuration](https://huggingface.co/mixedbread-ai/mxbai-embed-large-v1/blob/main/config_sentence_transformers.json) under the prompt name "query", so users can specify that prompt_name in model.encode:

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")
query_embedding = model.encode("What are Pandas?", prompt_name="query")
or
query_embedding = model.encode("What are Pandas?", prompt="Represent this sentence for searching relevant passages: ")
document_embeddings = model.encode([
"Pandas is a software library written for the Python programming language for data manipulation and analysis.",
"Pandas are a species of bear native to South Central China. They are also known as the giant panda or simply panda.",
"Koala bears are not actually bears, they are marsupials native to Australia.",
])
similarity = model.similarity(query_embedding, document_embeddings)
print(similarity)

3.2.1

pip install sentence-transformers[onnx-gpu]==3.2.1
pip install sentence-transformers[onnx]==3.2.1
pip install sentence-transformers[openvino]==3.2.1


Fixing Loading non-Transformer models
In v3.2.0, a non-[Transformer](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.Transformer) based model (e.g. CLIP) would not load correctly if the model was saved in the root of the model repository/directory. This has been resolved in 3007.

Throw error if `StaticEmbedding`-based model is finetuned with incompatible losses
The following losses are not compatible with `StaticEmbedding`-based models:
* CachedGISTEmbedLoss
* CachedMultipleNegativesRankingLoss
* CachedMultipleNegativesSymmetricRankingLoss
* DenoisingAutoEncoderLoss
* GISTEmbedLoss

An error is now thrown when one of these are used with a `StaticEmbedding`-based model. I recommend using MultipleNegativesRankingLoss to finetune these models, e.g. as in https://huggingface.co/tomaarsen/static-bert-uncased-gooaq.
Note: to get good performance, you must use much higher learning rates than otherwise. In my experiments, 2e-1 worked well.

Patch ONNX model when the model uses `output_hidden_states`

For example, this script used to fail, but passes now:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"distiluse-base-multilingual-cased",
backend="onnx",
model_kwargs={"provider": "CPUExecutionProvider"},
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)
print(embeddings.shape)


All changes
* Bump optimum version by echarlaix in https://github.com/UKPLab/sentence-transformers/pull/2984
* [`docs`] Update the training snippets for some losses that should use the v3 Trainer by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2987
* [`enh`] Throw error if StaticEmbedding-based model is trained with incompatible loss by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2990
* [`fix`] Fix semantic_search_usearch with 'binary' by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2989
* [enh] Add support for large_string in model card create by yaohwang in https://github.com/UKPLab/sentence-transformers/pull/2999
* [`model cards`] Prevent crash on generating widgets if dataset column is empty by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2997
* [fix] Added model2vec import compatible with current and newer version by Pringled in https://github.com/UKPLab/sentence-transformers/pull/2992
* Fix cache_dir issue with loading CLIPModel by BoPeng in https://github.com/UKPLab/sentence-transformers/pull/3007
* [`warn`] Throw a warning if compute_metrics is set, as it's not used by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3002
* [`fix`] Prevent IndexError if output_hidden_states & ONNX by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3008

New Contributors
* echarlaix made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2984
* yaohwang made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2999
* Pringled made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2992
* BoPeng made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3007

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.2.0...v3.2.1

3.2.0

pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0


Faster ONNX and OpenVINO Backends for SentenceTransformer (2712)
Introducing a new `backend` keyword argument to the `SentenceTransformer` initialization, allowing values of `"torch"` (default), `"onnx"`, and `"openvino"`.
These come with new installations:
bash
pip install sentence-transformers[onnx-gpu]
or ONNX for CPU only:
pip install sentence-transformers[onnx]
or
pip install sentence-transformers[openvino]

It's as simple as:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a `backend` and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to `model.push_to_hub` or `model.save_pretrained` into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via `model_kwargs` will be passed on to [`ORTModel.from_pretrained`](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) or [`OVBaseModel.from_pretrained`](https://huggingface.co/docs/optimum/intel/openvino/reference#optimum.intel.openvino.modeling_base.OVBaseModel.from_pretrained). The most useful arguments are:

* `provider`: (Only if `backend="onnx"`) ONNX Runtime provider to use for loading the model, e.g. `"CPUExecutionProvider"` . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. `"CUDAExecutionProvider"`) will be used.
* `file_name`: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
* `export`: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"all-MiniLM-L6-v2",
backend="onnx",
model_kwargs={
"file_name": "model_O3.onnx",
"provider": "CPUExecutionProvider",
}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)


Benchmarks
We ran [benchmarks](https://sbert.net/docs/sentence_transformer/usage/efficiency.html#benchmark) for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

<p float="left">
<img src="https://github.com/user-attachments/assets/d3f423ff-ad4e-4c91-9beb-8217a062a61d" width="45%" />
<img src="https://github.com/user-attachments/assets/3b9ae402-1127-4152-a925-70c3d626b27d" width="45%" />
</p>

These findings resulted in these recommendations:
![image](https://github.com/user-attachments/assets/0ace85c5-622b-471a-8e20-9331a1ae12c7)

For GPU, you can expect **2x speedup with fp16 at no cost**, and for CPU you can expect **~2.5x speedup at a cost of 0.4% accuracy**.

<details><summary>ONNX Optimization and Quantization</summary>

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

[`export_optimized_onnx_model`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_optimized_onnx_model): This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/optimization#optimizing-a-model-during-the-onnx-export). This function accepts:
* `model` A SentenceTransformer model loaded with `backend="onnx"`.
* `optimization_config`: ["O1", "O2", "O3", or "O4" from ๐Ÿค— Optimum](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/optimization) or a custom [`OptimizationConfig`](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.OptimizationConfig) instance.
* `model_name_or_path`: The directory or model repository where the optimized model will be saved.
* `push_to_hub`: Whether the push the exported model to the hub with `model_name_or_path` as the repository name. If False, the model will be saved in the directory specified with `model_name_or_path`.
* `create_pr`: If `push_to_hub`, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
* `file_suffix`: The suffix to add to the optimized model file name. Will use the `optimization_config` string or `"optimized"` if not set.

The usage is like this:
python
from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
model=onnx_model,
optimization_config="O4",
model_name_or_path="BAAI/bge-large-en-v1.5",
push_to_hub=True,
create_pr=True,
)

After which you can load the model with:
python
from sentence_transformers import SentenceTransformer

pull_request_nr = 2 TODO: Update this to the number of your pull request
model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_O4.onnx"},
revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_O4.onnx"},
)


Quantization
[`export_dynamic_quantized_onnx_model`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_dynamic_quantized_onnx_model): This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts
* `model` A SentenceTransformer model loaded with `backend="onnx"`.
* `quantization_config`: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from [AutoQuantizationConfig](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig), or an [QuantizationConfig](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.QuantizationConfig) instance.
* `model_name_or_path`: The directory or model repository where the optimized model will be saved.
* `push_to_hub`: Whether the push the exported model to the hub with `model_name_or_path` as the repository name. If False, the model will be saved in the directory specified with `model_name_or_path`.
* `create_pr`: If `push_to_hub`, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
* `file_suffix`: The suffix to add to the optimized model file name. Will use the `quantization_config` string or e.g. `"int8_quantized"` if not set.


The usage is like this:
python
from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
model=onnx_model,
quantization_config="avx512",
model_name_or_path="BAAI/bge-large-en-v1.5",
push_to_hub=True,
create_pr=True,
)

After which you can load the model with:
python
from sentence_transformers import SentenceTransformer

pull_request_nr = 2 TODO: Update this to the number of your pull request
model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)


</details>

Lightning-Fast Static Embeddings via Model2Vec (2961)
If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to [GLoVe](https://nlp.stanford.edu/projects/glove/) or [Word2vec](https://en.wikipedia.org/wiki/Word2vec), i.e. they're bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks.

However, these Static Embeddings are created in different ways. For example:
1. Distillation via the [Model2Vec](https://github.com/MinishLab/model2vec) technique. This projects allows you to distill any Sentence Transformer model into Static Embeddings. For example, distilling [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) resulted in a Static Embeddings Sentence Transformer model that reaches 87.5% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on MTEB (+ PEARL & WordSim) and 97.4% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on [various classification benchmarks](https://github.com/MinishLab/model2vec?tab=readme-ov-file#classification-and-speed-benchmarks).
You can initialize Static Embeddings via Model2Vec in two ways:
* [`from_model2vec`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_model2vec): You can load one of the pretrained [Model2Vec models](https://huggingface.co/models?library=model2vec):
python
note: `pip install model2vec` is needed, but not for inference
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
static_embedding = StaticEmbedding.from_model2vec("minishlab/M2V_multilingual_output")
model = SentenceTransformer(modules=[static_embedding])

Encode some texts
queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities
scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8170, 0.3843],
[0.3929, 0.5818]])
"""

* [`from_distillation`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_distillation): You can use the name of any Sentence Transformer model alongside some parameters (See [this docs](https://github.com/MinishLab/model2vec#distilling-a-model2vec-model) for more information) to perform the distillation yourself, without needing any dataset. On my device, this takes ~4s on a GPU and ~2 minutes on a CPU:
python
note: `pip install model2vec` is needed, but not for inference
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding by distilling via model2vec
static_embedding = StaticEmbedding.from_distillation(
"mixedbread-ai/mxbai-embed-large-v1",
device="cuda",
pca_dims=256,
apply_zipf=True,
)
model = SentenceTransformer(modules=[static_embedding])

Encode some texts
queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities
scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8430, 0.3271],
[0.3213, 0.5861]])
"""


2. Random initialization: Although this initialization needs finetuning, finetuning a Sentence Transformers model backed by StaticEmbedding is extremely fast. For example, I was able to finetune [tomaarsen/static-bert-uncased-gooaq](https://huggingface.co/tomaarsen/static-bert-uncased-gooaq) with MatryoshkaLoss & MultipleNegativesRankingLoss on the entire (3 million pairs) [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) dataset in just 7 minutes. This model reaches a NDCG10 of 79.33 on a hold-out set of 10k samples from gooaq, whereas e.g. [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) reaches 85.01 NDCG10. In short, only 6.6% less performance for a model that's about 500x faster.
That's not a typo: I can compute embeddings for about 14000 [stsb](https://huggingface.co/datasets/sentence-transformers/stsb) sentences from per second on *CPU*, compared to about ~24 with [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5), a.k.a. 625x faster.

> [!NOTE]
> You can `save_pretrained` and load these models like any other Sentence Transformer models, the `StaticEmbedding` initialization is only necessary when you're creating a *new* model.
> * Creation:
> python
> from sentence_transformers import SentenceTransformer
> from sentence_transformers.models import StaticEmbedding
>
> Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
> static_embedding = StaticEmbedding.from_distillation(
> "mixedbread-ai/mxbai-embed-large-v1",
> device="cuda",
> pca_dims=256,
> apply_zipf=True,
> )
> model = SentenceTransformer(modules=[static_embedding])
> model.save_pretrained("static-mxbai-embed-large-v1")
> or
> model.push_to_hub("tomaarsen/static-mxbai-embed-large-v1")
>
> * Inference:
> python
> from sentence_transformers import SentenceTransformer
>
> Initialize a Sentence Transformer model with a static embedding
> model = SentenceTransformer("static-mxbai-embed-large-v1")
>
> model.encode([...])
>

Small changes
* The [`InformationRetrievalEvaluator`](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#informationretrievalevaluator) now accepts `query_prompt`, `query_prompt_name`, `corpus_prompt`, and `corpus_prompt_name` arguments, useful if your model requires specific prompts for queries and/or documents for the best performance. (2951)
* The [`mine_hard_negatives`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives) function now accepts `anchor_column_name` and `positive_column_name` for specifying which dataset columns will be used. If not specified, the first two columns are used, respectively. Additionally, the `min_score` parameter is added, ensuring that all mined negatives have a similarity score of at least `min_score` according to the chosen `SentenceTransformer` or `CrossEncoder` model. (2977)
* If you're using multiple evaluators during training via [SequentialEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sequentialevaluator), e.g. multiple evaluators for different Matryoshka dimensions, then the order is now preserved in the training logs in the model card. Previously, they were sorted by name, resulting in weird orderings (e.g. "gooaq-1024", "gooaq-128", "gooaq-256", "gooaq-32", "gooaq-512", "gooaq-64") (2963)
* [`CachedGISTEmbedLoss`](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) has been improved to support multiple negatives per sample, i.e. the loss now accepts data in the `(anchor, positive, negative_1, โ€ฆ, negative_n)` format. It is the third loss to support this format (see [docs](https://sbert.net/docs/sentence_transformer/loss_overview.html)):

![image](https://github.com/user-attachments/assets/758a2143-87f8-4d5e-9cf4-e887e50e8c73)

All changes
* [`fix`] Only save first module in root if "save_in_root" is specified. by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2957
* [`feat`] Add query prompts to Information Retrieval Evaluator by ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2951
* [`model cards`] Keep evaluation order in training logs if there's multiple evaluators by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2963
* Add negatives in CachedGISTEmbedLoss by daegonYu in https://github.com/UKPLab/sentence-transformers/pull/2946
* [ENH] -- `CrossEncoder.rank` by it176131 in https://github.com/UKPLab/sentence-transformers/pull/2947
* [`feat`] Add lightning-fast StaticEmbedding module based on model2vec by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2961
* [`feat`] Add ONNX and OpenVINO backends by helena-intel and tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2712
* Refine mine_hard_negatives arguments by bakrianoo in https://github.com/UKPLab/sentence-transformers/pull/2977

New Contributors
* daegonYu made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2946
* it176131 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2947
* helena-intel made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2712
* bakrianoo made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2977

Special thanks to echarlaix for making the new backends possible due to some last-minute changes in `optimum` and `optimum-intel`.

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.1.1...v3.2.0

3.1.1

Hard Negatives Mining Patch (2944)
The [`mine_hard_negatives`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives) utility introduced in the previous release would fail if `use_faiss=True` & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with [all Sentence Transformer models](https://huggingface.co/models?library=sentence-transformers):
python
from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
features: ['query', 'answer'],
num_rows: 10000
})
"""

Mine hard negatives
dataset = mine_hard_negatives(
dataset=dataset,
model=model,
range_min=10,
range_max=50,
max_score=0.8,
margin=0.1,
num_negatives=5,
sampling_strategy="random",
batch_size=128,
use_faiss=True,
)
'''
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 75/75 [00:21<00:00, 3.51it/s]
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 79/79 [00:03<00:00, 25.77it/s]
Querying FAISS index: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:00<00:00, 3.98it/s]
Metric Positive Negative Difference
Count 10,000 47,711
Mean 0.7600 0.5376 0.2299
Median 0.7673 0.5379 0.2274
Std 0.0658 0.0387 0.0629
Min 0.3858 0.3732 0.1044
25% 0.7219 0.5129 0.1833
50% 0.7673 0.5379 0.2274
75% 0.8058 0.5617 0.2724
Max 0.9341 0.7024 0.4780
Skipped 48770 potential negatives (9.56%) due to the margin of 0.1.
Could not find enough negatives for 2289 samples (4.58%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
features: ['query', 'answer', 'negative'],
num_rows: 47711
})
'''
print(dataset[0])
'''
{
'query': 'where is the us navy base in japan located',
'answer': 'United States Fleet Activities Yokosuka The United States Fleet Activities Yokosuka (ๆจช้ ˆ่ณ€ๆตท ่ปๆ–ฝ่จญ, Yokosuka kaigunshisetsu) or Commander Fleet Activities Yokosuka (ๅธไปคๅฎ˜่‰ฆ้šŠๆดปๅ‹•ๆจช้ ˆ่ณ€, Shirei-kan kantai katsudล Yokosuka) is a United States Navy base in Yokosuka, Japan. Its mission is to maintain and operate base facilities for the logistic, recreational, administrative support and service of the U.S. Naval Forces Japan, Seventh Fleet and other operating forces assigned in the Western Pacific. CFAY is the largest strategically important U.S. naval installation in the western Pacific.[1] As of August 2013[update], it was commanded by Captain David Glenister.',
'negative': "2011 Tลhoku earthquake and tsunami The earthquake took place at 14:46 JST (UTC 05:46) around 67\xa0km (42\xa0mi) from the nearest point on Japan's coastline, and initial estimates indicated the tsunami would have taken 10 to 30\xa0minutes to reach the areas first affected, and then areas farther north and south based on the geography of the coastline.[127][128] Just over an hour after the earthquake at 15:55 JST, a tsunami was observed flooding Sendai Airport, which is located near the coast of Miyagi Prefecture,[129][130] with waves sweeping away cars and planes and flooding various buildings as they traveled inland.[131][132] The impact of the tsunami in and around Sendai Airport was filmed by an NHK News helicopter, showing a number of vehicles on local roads trying to escape the approaching wave and being engulfed by it.[133] A 4-metre-high (13\xa0ft) tsunami hit Iwate Prefecture.[134] Wakabayashi Ward in Sendai was also particularly hard hit.[135] At least 101 designated tsunami evacuation sites were hit by the wave.[136]"
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")


Thanks to omarnj-lab for pointing out the bug to me.

Numpy restriction lifted (2937)
The [v3.1.0 Sentence Transformers release](https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0) required `numpy<2` to prevent crashes on Windows. However, various third-parties (e.g. scipy) have now been recompiled & released, allowing the Windows tests to pass again.

If you experience the following snippet:

> A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
> If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Then consider 1) upgrading the dependency from which the error occurred or 2) downgrading `numpy` to below v2:

pip install -U numpy<2


Thanks to kozlek for pointing this out to me and helping getting it resolved.

All changes
* [`deps`] Attempt to remove numpy restrictions by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2937
* [`metadata`] Extend pyproject.toml metadata by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2943
* [`fix`] Ensure that the embeddings from hard negative mining are normalized by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2944

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.1.0...v3.1.1

Page 10 of 23

ยฉ 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.