Sentence-transformers

Latest version: v3.4.1

Safety actively analyzes 706267 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 11 of 24

3.2.0

pip install sentence-transformers[onnx-gpu]==3.2.0
pip install sentence-transformers[onnx]==3.2.0
pip install sentence-transformers[openvino]==3.2.0


Faster ONNX and OpenVINO Backends for SentenceTransformer (2712)
Introducing a new `backend` keyword argument to the `SentenceTransformer` initialization, allowing values of `"torch"` (default), `"onnx"`, and `"openvino"`.
These come with new installations:
bash
pip install sentence-transformers[onnx-gpu]
or ONNX for CPU only:
pip install sentence-transformers[onnx]
or
pip install sentence-transformers[openvino]

It's as simple as:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)

If you specify a `backend` and your model repository or directory contains an ONNX/OpenVINO model file, it will automatically be used! And if your model repository or directory doesn't have one already, an ONNX/OpenVINO model will be automatically exported. Just remember to `model.push_to_hub` or `model.save_pretrained` into the same model repository or directory to avoid having to re-export the model every time.

All keyword arguments passed via `model_kwargs` will be passed on to [`ORTModel.from_pretrained`](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained) or [`OVBaseModel.from_pretrained`](https://huggingface.co/docs/optimum/intel/openvino/reference#optimum.intel.openvino.modeling_base.OVBaseModel.from_pretrained). The most useful arguments are:

* `provider`: (Only if `backend="onnx"`) ONNX Runtime provider to use for loading the model, e.g. `"CPUExecutionProvider"` . See https://onnxruntime.ai/docs/execution-providers/ for possible providers. If not specified, the strongest provider (E.g. `"CUDAExecutionProvider"`) will be used.
* `file_name`: The name of the ONNX file to load. If not specified, will default to "model.onnx" or otherwise "onnx/model.onnx" for ONNX, and "openvino_model.xml" and "openvino/openvino_model.xml" for OpenVINO. This argument is useful for specifying optimized or quantized models.
* `export`: A boolean flag specifying whether the model will be exported. If not provided, export will be set to True if the model repository or directory does not already contain an ONNX or OpenVINO model.

For example:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"all-MiniLM-L6-v2",
backend="onnx",
model_kwargs={
"file_name": "model_O3.onnx",
"provider": "CPUExecutionProvider",
}
)

sentences = ["This is an example sentence", "Each sentence is converted"]
embeddings = model.encode(sentences)


Benchmarks
We ran [benchmarks](https://sbert.net/docs/sentence_transformer/usage/efficiency.html#benchmark) for CPU and GPU, averaging findings across 4 models of various sizes, 3 datasets, and numerous batch sizes. Here are the findings:

<p float="left">
<img src="https://github.com/user-attachments/assets/d3f423ff-ad4e-4c91-9beb-8217a062a61d" width="45%" />
<img src="https://github.com/user-attachments/assets/3b9ae402-1127-4152-a925-70c3d626b27d" width="45%" />
</p>

These findings resulted in these recommendations:
![image](https://github.com/user-attachments/assets/0ace85c5-622b-471a-8e20-9331a1ae12c7)

For GPU, you can expect **2x speedup with fp16 at no cost**, and for CPU you can expect **~2.5x speedup at a cost of 0.4% accuracy**.

<details><summary>ONNX Optimization and Quantization</summary>

In addition to exporting default ONNX and OpenVINO models, we also introduce 2 helper methods for optimizing and quantizing ONNX models:

Optimization

[`export_optimized_onnx_model`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_optimized_onnx_model): This function uses Optimum to implement several optimizations in the ONNX model, ranging from basic optimizations to approximations and mixed precision. Read about the 4 default options [here](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/optimization#optimizing-a-model-during-the-onnx-export). This function accepts:
* `model` A SentenceTransformer model loaded with `backend="onnx"`.
* `optimization_config`: ["O1", "O2", "O3", or "O4" from ๐Ÿค— Optimum](https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/optimization) or a custom [`OptimizationConfig`](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.OptimizationConfig) instance.
* `model_name_or_path`: The directory or model repository where the optimized model will be saved.
* `push_to_hub`: Whether the push the exported model to the hub with `model_name_or_path` as the repository name. If False, the model will be saved in the directory specified with `model_name_or_path`.
* `create_pr`: If `push_to_hub`, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for optimizing models of repositories that you don't have write access to.
* `file_suffix`: The suffix to add to the optimized model file name. Will use the `optimization_config` string or `"optimized"` if not set.

The usage is like this:
python
from sentence_transformers import SentenceTransformer, export_optimized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_optimized_onnx_model(
model=onnx_model,
optimization_config="O4",
model_name_or_path="BAAI/bge-large-en-v1.5",
push_to_hub=True,
create_pr=True,
)

After which you can load the model with:
python
from sentence_transformers import SentenceTransformer

pull_request_nr = 2 TODO: Update this to the number of your pull request
model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_O4.onnx"},
revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_O4.onnx"},
)


Quantization
[`export_dynamic_quantized_onnx_model`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.backend.export_dynamic_quantized_onnx_model): This function uses Optimum to quantize the ONNX model to int8, also allowing for hardware-specific optimizations. This results in impressive speedups for CPUs. In my findings, each of the default quantization configuration options gave approximately the same performance improvements. This function accepts
* `model` A SentenceTransformer model loaded with `backend="onnx"`.
* `quantization_config`: "arm64", "avx2", "avx512", or "avx512_vnni" representing quantization configurations from [AutoQuantizationConfig](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.AutoQuantizationConfig), or an [QuantizationConfig](https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/configuration#optimum.onnxruntime.QuantizationConfig) instance.
* `model_name_or_path`: The directory or model repository where the optimized model will be saved.
* `push_to_hub`: Whether the push the exported model to the hub with `model_name_or_path` as the repository name. If False, the model will be saved in the directory specified with `model_name_or_path`.
* `create_pr`: If `push_to_hub`, then this denotes whether a pull request is created rather than pushing the model directly to the repository. Very useful for quantizing models of repositories that you don't have write access to.
* `file_suffix`: The suffix to add to the optimized model file name. Will use the `quantization_config` string or e.g. `"int8_quantized"` if not set.


The usage is like this:
python
from sentence_transformers import SentenceTransformer, export_quantized_onnx_model

onnx_model = SentenceTransformer("BAAI/bge-large-en-v1.5", backend="onnx")
export_quantized_onnx_model(
model=onnx_model,
quantization_config="avx512",
model_name_or_path="BAAI/bge-large-en-v1.5",
push_to_hub=True,
create_pr=True,
)

After which you can load the model with:
python
from sentence_transformers import SentenceTransformer

pull_request_nr = 2 TODO: Update this to the number of your pull request
model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
revision=f"refs/pr/{pull_request_nr}"
)

or when it gets merged:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
"BAAI/bge-large-en-v1.5",
backend="onnx",
model_kwargs={"file_name": "onnx/model_qint8_avx512.onnx"},
)


</details>

Lightning-Fast Static Embeddings via Model2Vec (2961)
If ONNX or OpenVINO isn't fast enough for you yet, then perhaps you'll enjoy Static Embeddings. These embeddings are a bit akin to [GLoVe](https://nlp.stanford.edu/projects/glove/) or [Word2vec](https://en.wikipedia.org/wiki/Word2vec), i.e. they're bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks.

However, these Static Embeddings are created in different ways. For example:
1. Distillation via the [Model2Vec](https://github.com/MinishLab/model2vec) technique. This projects allows you to distill any Sentence Transformer model into Static Embeddings. For example, distilling [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) resulted in a Static Embeddings Sentence Transformer model that reaches 87.5% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on MTEB (+ PEARL & WordSim) and 97.4% of the performance of [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on [various classification benchmarks](https://github.com/MinishLab/model2vec?tab=readme-ov-file#classification-and-speed-benchmarks).
You can initialize Static Embeddings via Model2Vec in two ways:
* [`from_model2vec`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_model2vec): You can load one of the pretrained [Model2Vec models](https://huggingface.co/models?library=model2vec):
python
note: `pip install model2vec` is needed, but not for inference
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
static_embedding = StaticEmbedding.from_model2vec("minishlab/M2V_multilingual_output")
model = SentenceTransformer(modules=[static_embedding])

Encode some texts
queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities
scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8170, 0.3843],
[0.3929, 0.5818]])
"""

* [`from_distillation`](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.StaticEmbedding.from_distillation): You can use the name of any Sentence Transformer model alongside some parameters (See [this docs](https://github.com/MinishLab/model2vec#distilling-a-model2vec-model) for more information) to perform the distillation yourself, without needing any dataset. On my device, this takes ~4s on a GPU and ~2 minutes on a CPU:
python
note: `pip install model2vec` is needed, but not for inference
from sentence_transformers import SentenceTransformer
from sentence_transformers.models import StaticEmbedding

Initialize a Sentence Transformer model with a static embedding by distilling via model2vec
static_embedding = StaticEmbedding.from_distillation(
"mixedbread-ai/mxbai-embed-large-v1",
device="cuda",
pca_dims=256,
apply_zipf=True,
)
model = SentenceTransformer(modules=[static_embedding])

Encode some texts
queries = ["What is the capital of France?", "How many people live in the Netherlands?"]
documents = ["Paris is the capital of France", "The Netherlands has 17 million inhabitants"]
query_embeddings = model.encode(queries)
document_embeddings = model.encode(documents)

Compute similarities
scores = model.similarity(query_embeddings, document_embeddings)
print(scores)
"""
tensor([[0.8430, 0.3271],
[0.3213, 0.5861]])
"""


2. Random initialization: Although this initialization needs finetuning, finetuning a Sentence Transformers model backed by StaticEmbedding is extremely fast. For example, I was able to finetune [tomaarsen/static-bert-uncased-gooaq](https://huggingface.co/tomaarsen/static-bert-uncased-gooaq) with MatryoshkaLoss & MultipleNegativesRankingLoss on the entire (3 million pairs) [gooaq](https://huggingface.co/datasets/sentence-transformers/gooaq) dataset in just 7 minutes. This model reaches a NDCG10 of 79.33 on a hold-out set of 10k samples from gooaq, whereas e.g. [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) reaches 85.01 NDCG10. In short, only 6.6% less performance for a model that's about 500x faster.
That's not a typo: I can compute embeddings for about 14000 [stsb](https://huggingface.co/datasets/sentence-transformers/stsb) sentences from per second on *CPU*, compared to about ~24 with [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5), a.k.a. 625x faster.

> [!NOTE]
> You can `save_pretrained` and load these models like any other Sentence Transformer models, the `StaticEmbedding` initialization is only necessary when you're creating a *new* model.
> * Creation:
> python
> from sentence_transformers import SentenceTransformer
> from sentence_transformers.models import StaticEmbedding
>
> Initialize a Sentence Transformer model with a static embedding from a pretrained model2vec model
> static_embedding = StaticEmbedding.from_distillation(
> "mixedbread-ai/mxbai-embed-large-v1",
> device="cuda",
> pca_dims=256,
> apply_zipf=True,
> )
> model = SentenceTransformer(modules=[static_embedding])
> model.save_pretrained("static-mxbai-embed-large-v1")
> or
> model.push_to_hub("tomaarsen/static-mxbai-embed-large-v1")
>
> * Inference:
> python
> from sentence_transformers import SentenceTransformer
>
> Initialize a Sentence Transformer model with a static embedding
> model = SentenceTransformer("static-mxbai-embed-large-v1")
>
> model.encode([...])
>

Small changes
* The [`InformationRetrievalEvaluator`](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#informationretrievalevaluator) now accepts `query_prompt`, `query_prompt_name`, `corpus_prompt`, and `corpus_prompt_name` arguments, useful if your model requires specific prompts for queries and/or documents for the best performance. (2951)
* The [`mine_hard_negatives`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives) function now accepts `anchor_column_name` and `positive_column_name` for specifying which dataset columns will be used. If not specified, the first two columns are used, respectively. Additionally, the `min_score` parameter is added, ensuring that all mined negatives have a similarity score of at least `min_score` according to the chosen `SentenceTransformer` or `CrossEncoder` model. (2977)
* If you're using multiple evaluators during training via [SequentialEvaluator](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sequentialevaluator), e.g. multiple evaluators for different Matryoshka dimensions, then the order is now preserved in the training logs in the model card. Previously, they were sorted by name, resulting in weird orderings (e.g. "gooaq-1024", "gooaq-128", "gooaq-256", "gooaq-32", "gooaq-512", "gooaq-64") (2963)
* [`CachedGISTEmbedLoss`](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedgistembedloss) has been improved to support multiple negatives per sample, i.e. the loss now accepts data in the `(anchor, positive, negative_1, โ€ฆ, negative_n)` format. It is the third loss to support this format (see [docs](https://sbert.net/docs/sentence_transformer/loss_overview.html)):

![image](https://github.com/user-attachments/assets/758a2143-87f8-4d5e-9cf4-e887e50e8c73)

All changes
* [`fix`] Only save first module in root if "save_in_root" is specified. by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2957
* [`feat`] Add query prompts to Information Retrieval Evaluator by ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2951
* [`model cards`] Keep evaluation order in training logs if there's multiple evaluators by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2963
* Add negatives in CachedGISTEmbedLoss by daegonYu in https://github.com/UKPLab/sentence-transformers/pull/2946
* [ENH] -- `CrossEncoder.rank` by it176131 in https://github.com/UKPLab/sentence-transformers/pull/2947
* [`feat`] Add lightning-fast StaticEmbedding module based on model2vec by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2961
* [`feat`] Add ONNX and OpenVINO backends by helena-intel and tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2712
* Refine mine_hard_negatives arguments by bakrianoo in https://github.com/UKPLab/sentence-transformers/pull/2977

New Contributors
* daegonYu made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2946
* it176131 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2947
* helena-intel made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2712
* bakrianoo made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2977

Special thanks to echarlaix for making the new backends possible due to some last-minute changes in `optimum` and `optimum-intel`.

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.1.1...v3.2.0

3.1.1

Hard Negatives Mining Patch (2944)
The [`mine_hard_negatives`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives) utility introduced in the previous release would fail if `use_faiss=True` & the model does not automatically normalize its embeddings. This release patches that, allowing the utility to work with [all Sentence Transformer models](https://huggingface.co/models?library=sentence-transformers):
python
from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

Load a Sentence Transformer model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1").bfloat16()

Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train[:10000]")
print(dataset)
"""
Dataset({
features: ['query', 'answer'],
num_rows: 10000
})
"""

Mine hard negatives
dataset = mine_hard_negatives(
dataset=dataset,
model=model,
range_min=10,
range_max=50,
max_score=0.8,
margin=0.1,
num_negatives=5,
sampling_strategy="random",
batch_size=128,
use_faiss=True,
)
'''
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 75/75 [00:21<00:00, 3.51it/s]
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 79/79 [00:03<00:00, 25.77it/s]
Querying FAISS index: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1/1 [00:00<00:00, 3.98it/s]
Metric Positive Negative Difference
Count 10,000 47,711
Mean 0.7600 0.5376 0.2299
Median 0.7673 0.5379 0.2274
Std 0.0658 0.0387 0.0629
Min 0.3858 0.3732 0.1044
25% 0.7219 0.5129 0.1833
50% 0.7673 0.5379 0.2274
75% 0.8058 0.5617 0.2724
Max 0.9341 0.7024 0.4780
Skipped 48770 potential negatives (9.56%) due to the margin of 0.1.
Could not find enough negatives for 2289 samples (4.58%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
features: ['query', 'answer', 'negative'],
num_rows: 47711
})
'''
print(dataset[0])
'''
{
'query': 'where is the us navy base in japan located',
'answer': 'United States Fleet Activities Yokosuka The United States Fleet Activities Yokosuka (ๆจช้ ˆ่ณ€ๆตท ่ปๆ–ฝ่จญ, Yokosuka kaigunshisetsu) or Commander Fleet Activities Yokosuka (ๅธไปคๅฎ˜่‰ฆ้šŠๆดปๅ‹•ๆจช้ ˆ่ณ€, Shirei-kan kantai katsudล Yokosuka) is a United States Navy base in Yokosuka, Japan. Its mission is to maintain and operate base facilities for the logistic, recreational, administrative support and service of the U.S. Naval Forces Japan, Seventh Fleet and other operating forces assigned in the Western Pacific. CFAY is the largest strategically important U.S. naval installation in the western Pacific.[1] As of August 2013[update], it was commanded by Captain David Glenister.',
'negative': "2011 Tลhoku earthquake and tsunami The earthquake took place at 14:46 JST (UTC 05:46) around 67\xa0km (42\xa0mi) from the nearest point on Japan's coastline, and initial estimates indicated the tsunami would have taken 10 to 30\xa0minutes to reach the areas first affected, and then areas farther north and south based on the geography of the coastline.[127][128] Just over an hour after the earthquake at 15:55 JST, a tsunami was observed flooding Sendai Airport, which is located near the coast of Miyagi Prefecture,[129][130] with waves sweeping away cars and planes and flooding various buildings as they traveled inland.[131][132] The impact of the tsunami in and around Sendai Airport was filmed by an NHK News helicopter, showing a number of vehicles on local roads trying to escape the approaching wave and being engulfed by it.[133] A 4-metre-high (13\xa0ft) tsunami hit Iwate Prefecture.[134] Wakabayashi Ward in Sendai was also particularly hard hit.[135] At least 101 designated tsunami evacuation sites were hit by the wave.[136]"
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")


Thanks to omarnj-lab for pointing out the bug to me.

Numpy restriction lifted (2937)
The [v3.1.0 Sentence Transformers release](https://github.com/UKPLab/sentence-transformers/releases/tag/v3.1.0) required `numpy<2` to prevent crashes on Windows. However, various third-parties (e.g. scipy) have now been recompiled & released, allowing the Windows tests to pass again.

If you experience the following snippet:

> A module that was compiled using NumPy 1.x cannot be run in NumPy 2.0.0 as it may crash. To support both 1.x and 2.x versions of NumPy, modules must be compiled with NumPy 2.0. Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.
> If you are a user of the module, the easiest solution will be to downgrade to 'numpy<2' or try to upgrade the affected module. We expect that some modules will need time to support NumPy 2.

Then consider 1) upgrading the dependency from which the error occurred or 2) downgrading `numpy` to below v2:

pip install -U numpy<2


Thanks to kozlek for pointing this out to me and helping getting it resolved.

All changes
* [`deps`] Attempt to remove numpy restrictions by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2937
* [`metadata`] Extend pyproject.toml metadata by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2943
* [`fix`] Ensure that the embeddings from hard negative mining are normalized by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2944

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.1.0...v3.1.1

3.1.0

> [!WARNING]
> Due to incompatibilities with Windows, we have set `numpy<2` in the Sentence Transformers requirements. If you're not on Windows, you can still install `numpy>=2` and everything should work as expected.

Hard Negatives Mining utility (2768, 2848)

Hard negatives are texts that are rather similar to some anchor text (e.g. a question), but are not the correct match. For example:

* Anchor: "are red pandas actually pandas?"
* Positive: "Red pandas, like giant pandas, are bamboo eaters native to Asia's high forests. Despite these similarities and their shared name, the two species are not closely related. Red pandas are much smaller than giant pandas and are the only living member of their taxonomic family."
* Hard negative: "The giant panda (Ailuropoda melanoleuca; Chinese: ๅคง็†Š็Œซ; pinyin: dร xiรณngmฤo), also known as the panda bear or simply the panda, is a bear native to south central China."

These negatives are more difficult for a model to distinguish from the correct answer, leading to a stronger training signal and a stronger overall model when used with one of the [Loss Functions](https://sbert.net/docs/sentence_transformer/loss_overview.html) that accepts (anchor, positive, negative) pairs such as the one above.

This release introduces a utility function called [`mine_hard_negatives`](https://sbert.net/docs/package_reference/util.html#sentence_transformers.util.mine_hard_negatives) that allows you to mine for these hard negatives given a (anchor, positive) dataset (and optionally a corpus of negative candidate texts).

It boasts the following features to give you fine-grained control over the similarity of the mined negatives relative to the anchor:

* [CrossEncoder](https://sbert.net/docs/quickstart.html#cross-encoder) rescoring for higher quality negative selection.
* Skip the top $n$ negative candidates as these might be true positives.
* Consider only the top $n$ negative candidates.
* Skip negative candidates that are within some `margin` of the true similarity between anchor and positive.
* Skip negative candidates whose similarity is larger than some `max_score`.
* Two sampling strategies: pick the top negative candidates that satisfy the requirements, or pick them randomly.
* FAISS index for searching for negative candidates.
* Option to return data as triplets only, or as `2 + num_negatives`-tuples.

python
from sentence_transformers.util import mine_hard_negatives
from sentence_transformers import SentenceTransformer
from datasets import load_dataset

Load a Sentence Transformer model
model = SentenceTransformer("all-MiniLM-L6-v2")

Load a dataset to mine hard negatives from
dataset = load_dataset("sentence-transformers/natural-questions", split="train")
print(dataset)
"""
Dataset({
features: ['query', 'answer'],
num_rows: 100231
})
"""

Mine hard negatives
dataset = mine_hard_negatives(
dataset=dataset,
model=model,
range_min=10,
range_max=50,
max_score=0.8,
margin=0.1,
num_negatives=5,
sampling_strategy="random",
batch_size=128,
use_faiss=True,
)
'''
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 588/588 [00:33<00:00, 17.37it/s]
Batches: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 784/784 [00:07<00:00, 101.55it/s]
Querying FAISS index: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 7/7 [00:07<00:00, 1.06s/it]
Metric Positive Negative Difference
Count 100,231 460,725 460,725
Mean 0.6866 0.4133 0.2917
Median 0.7010 0.4059 0.2873
Std 0.1125 0.0673 0.1006
Min 0.0303 0.1638 0.1029
25% 0.6221 0.3649 0.2112
50% 0.7010 0.4059 0.2873
75% 0.7667 0.4561 0.3647
Max 0.9584 0.7362 0.7073
Skipped 882722 potential negatives (17.27%) due to the margin of 0.1.
Skipped 27 potential negatives (0.00%) due to the maximum score of 0.8.
Could not find enough negatives for 40430 samples (8.07%). Consider adjusting the range_max, range_min, margin and max_score parameters if you'd like to find more valid negatives.
'''
print(dataset)
'''
Dataset({
features: ['query', 'answer', 'negative'],
num_rows: 460725
})
'''
print(dataset[0])
'''
{
'query': 'the first person to use the word geography was',
'answer': 'History of geography The history of geography includes many histories of geography which have differed over time and between different cultural and political groups. In more recent developments, geography has become a distinct academic discipline. \'Geography\' derives from the Greek ฮณฮตฯ‰ฮณฯฮฑฯ†ฮฏฮฑ โ€“ geographia,[1] a literal translation of which would be "to describe or write about the Earth". The first person to use the word "geography" was Eratosthenes (276โ€“194 BC). However, there is evidence for recognizable practices of geography, such as cartography (or map-making) prior to the use of the term geography.',
'negative': 'Terminology of the British Isles The word "Great" means "larger", in comparison with Brittany in modern-day France. One historical term for the peninsula in France that largely corresponds to the modern French province is Lesser or Little Britain. That region was settled by many British immigrants during the period of Anglo-Saxon migration into Britain, and named "Little Britain" by them. The French term "Bretagne" now refers to the French "Little Britain", not to the British "Great Britain", which in French is called Grande-Bretagne. In classical times, the Graeco-Roman geographer Ptolemy in his Almagest also called the larger island megale Brettania (great Britain). At that time, it was in contrast to the smaller island of Ireland, which he called mikra Brettania (little Britain).[62] In his later work Geography, Ptolemy refers to Great Britain as Albion and to Ireland as Iwernia. These "new" names were likely to have been the native names for the islands at the time. The earlier names, in contrast, were likely to have been coined before direct contact with local peoples was made.[63]'
}
'''
dataset.push_to_hub("natural-questions-hard-negatives", "triplet")


This dataset can immediately be used in conjunction with [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss), likely resulting in a stronger model than if you had just used the [natural-questions](https://huggingface.co/datasets/sentence-transformers/natural-questions) dataset outright.

Here are some example datasets that I created using this new function:
* https://huggingface.co/datasets/tomaarsen/gooaq-hard-negatives
* https://huggingface.co/datasets/tomaarsen/natural-questions-hard-negatives

Big thanks to ChrisGeishauser and ArthurCamara for assisting with this feature.

Add CachedMultipleNegativesSymmetricRankingLoss loss function (2879)

Let's break this down:
* [MultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) (MNRL): Given (anchor, positive) text pairs or (anchor, positive, negative) text triplets, this loss trains for "Given an anchor (e.g. a query), which text out of a big lineup (all positives and negatives in the batch) is the true positive (e.g. the answer)?".
* [MultipleNegativesSymmetricRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativessymmetricrankingloss) (MNSRL): Adaptation of MNRL that adds a second loss term which means: "Given an positive (e.g. an summary), which text out of a big lineup (all anchors) is the true anchor (e.g. the full article)?". This is useful for symmetric tasks, such as clustering, classification, finding similar texts, and a bit less useful for asymmetric tasks such as question-answer retrieval.
* [CachedMultipleNegativesRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativesrankingloss) (CMNRL): Adaptation of MNRL such that the batch size can be increased to an arbitrary size at a flat 10-20% training speed cost. A higher batch size means a larger lineup for the model to find the true positive in, often resulting in a better training signal and model.

The v3.1 Sentence Transformers release now introduces a new loss: [CachedMultipleNegativesSymmetricRankingLoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cachedmultiplenegativessymmetricrankingloss) (CMNSRL), which combines both of the previous adaptations. The result is a loss adept at symmetric training tasks for which you can pick an arbitrarily large batch size. It is likely the strongest loss for Semantic Textual Similarity (STS) tasks in Sentence Transformers now.
Big thanks to madhavthaker1 for working to include it.

Streaming Dataset support (2792)
The v3.1 release introduces support for training with [`datasets.IterableDataset`](https://huggingface.co/docs/datasets/v2.21.0/en/package_reference/main_classes#datasets.IterableDataset) ([*Differences between Dataset and IterableDataset* docs](https://huggingface.co/docs/datasets/en/about_mapstyle_vs_iterable)). This means that you can train without first downloading the full dataset to disk. For example:

python
from datasets import load_dataset

Load a streaming dataset to finetune on
train_dataset = load_dataset("sentence-transformers/gooaq", split="train", streaming=True)
IterableDataset({
features: ['question', 'answer'],
n_shards: 2
})

or
python
from datasets import IterableDataset, Value, Features

def dataset_generator_fn():
Gather, fetch, load, or generate data here
for ... in ...:
yield ...

train_dataset = IterableDataset.from_generator(dataset_generator_fn)
train_dataset = train_dataset.cast(Features({'question': Value(dtype='string', id=None), 'answer': Value(dtype='string', id=None)}))

(*Read more about Dataset features [here](https://huggingface.co/docs/datasets/en/about_dataset_features)*)

For a full example of training with a streaming dataset, consider this script:
python
import logging
from datasets import load_dataset
from sentence_transformers import (
SentenceTransformer,
SentenceTransformerTrainer,
SentenceTransformerTrainingArguments,
SentenceTransformerModelCardData,
)
from sentence_transformers.losses import MultipleNegativesRankingLoss
from sentence_transformers.training_args import BatchSamplers

logging.basicConfig(
format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
"microsoft/mpnet-base",
model_card_data=SentenceTransformerModelCardData(
language="en",
license="apache-2.0",
model_name="MPNet base trained on GooAQ pairs",
),
)

name = "mpnet-base-gooaq-streaming"

2. Load a streaming dataset to finetune on
train_dataset = load_dataset("sentence-transformers/gooaq", split="train", streaming=True)

3. Define a loss function
loss = MultipleNegativesRankingLoss(model)

4. (Optional) Specify training arguments
train_batch_size = 64
args = SentenceTransformerTrainingArguments(
Required parameter:
output_dir=f"models/{name}",
Optional training parameters:
num_train_epochs=1,
per_device_train_batch_size=train_batch_size,
learning_rate=2e-5,
warmup_ratio=0.1,
fp16=False, Set to False if you get an error that your GPU can't run on FP16
bf16=True, Set to True if you have a GPU that supports BF16
batch_sampler=BatchSamplers.NO_DUPLICATES, MultipleNegativesRankingLoss benefits from no duplicate samples in a batch
Optional tracking/debugging parameters:
save_strategy="steps",
save_steps=100,
save_total_limit=2,
logging_steps=250,
logging_first_step=True,
run_name=name, Will be used in W&B if `wandb` is installed
)

5. Create a trainer & train
trainer = SentenceTransformerTrainer(
model=model,
args=args,
train_dataset=train_dataset,
loss=loss,
)
trainer.train()

6. Save the trained model
model.save_pretrained(f"models/{name}/final")

7. (Optional) Push it to the Hugging Face Hub
model.push_to_hub(name)


Advanced: Allow for Custom Modules (2773)
Sentence Transformer models consist of several modules that are executed sequentially. Most models consist of a [Transformer](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.Transformer) module, a [Pooling](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.Pooling) module, and perhaps a [Dense](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.Dense) and/or [Normalize](https://sbert.net/docs/package_reference/sentence_transformer/models.html#sentence_transformers.models.Normalize) module. However, as of the v3.1 release, model authors can create their own modules by writing some custom modeling code. This code can be uploaded to the Hugging Face Hub alongside the model itself, after which users can load the model like normal.

This allows for authors to replace the `Transformer` module with one that includes model-specific quirks, or replace the `Pooling` module with an all-new pooling method. This even allows for multi-modal models as authors can customize the preprocessing of the first module.

[jinaai/jina-clip-v1](https://huggingface.co/jinaai/jina-clip-v1) is the first model to take advantage of this new feature, allowing you to encode both texts and images (via paths to local images or URLs) due to their custom preprocessing. Try it out yourself:
python
from sentence_transformers import SentenceTransformer

Load the model; must use trust_remote_code=True to run the custom module
model = SentenceTransformer("jinaai/jina-clip-v1", trust_remote_code=True)

Texts and images of blue and red cats to embed
sentences = ['A blue cat', 'A red cat']
image_urls = [
'https://i.pinimg.com/600x315/21/48/7e/21487e8e0970dd366dafaed6ab25d8d8.jpg',
'https://i.pinimg.com/736x/c9/f2/3e/c9f23e212529f13f19bad5602d84b78b.jpg'
]

Embed the texts and images like normal
text_embeddings = model.encode(sentences)
image_embeddings = model.encode(image_urls)

Compute similarity between text embeddings:
print(model.similarity(text_embeddings[0], text_embeddings[1]))
tensor([[โœ…0.5636]])

or cross-modal text and image embeddings:
print(model.similarity(text_embeddings, image_embeddings))

3.0.1

Not secure
SentenceTransformerTrainer improvements
* Implement gradient checkpointing for lower memory usage during training (2717)
* Implement support for `push_to_hub=True` Training Argument, also implement `trainer.push_to_hub(...)` (2718)

Model Cards
This patch release improves on the automatically generated model cards in several ways:
* Your training datasets are now automatically linked if they're on Hugging Face (2711)
* A new `generated_from_trainer` tag is now also added (2710)
* The automatically included widget examples are now improved, especially for question-answering. Previously, the widget could give examples of comparing two questions with eachother (2713)
* If you save a model locally, then load it again and upload it, it would previously still show
python
...
Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
...

This now gets replaced with your new model ID on Hugging Face (2714)
* The exact training dataset size is now included in the model metadata, rather than as a bucket of e.g. 1K<n<10K (2728)

Evaluators fixes
* The primary metric of evaluators in `SequentialEvaluator` would be ignored in the `scores` calculation (2700)
* Fix confusing print statement in TranslationEvaluator when using `print_wrong_matches=True` (1894)
* Fix bug that prevents you from customizing the `primary_metric` in `InformationRetrievalEvaluator` (2701)
* Allow passing a list of evaluators to the STTrainer rather than a `SequentialEvaluator` (2717)

Losses fixes
* Fix `MatryoshkaLoss` crash if the first dimension is not the biggest (2719)

Security
* Integrate safetensors with all modules, including Dense, LSTM, CNN, etc. to prevent needing pickled `pytorch_model.bin` anymore (2722)

All changes
* updating to evaluation_strategy by higorsilvaa in https://github.com/UKPLab/sentence-transformers/pull/2686
* fix loss link by Samoed in https://github.com/UKPLab/sentence-transformers/pull/2690
* Fix bug that restricts users from specifying custom primary_function in InformationRetrievalEvaluator by hetulvp in https://github.com/UKPLab/sentence-transformers/pull/2701
* Fix a bug in SequentialEvaluator to use primary_metric if defined in evaluator. by hetulvp in https://github.com/UKPLab/sentence-transformers/pull/2700
* [`fix`] Always override the originally saved __version__ in the ST config by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2709
* [`model cards`] Also include HF datasets in the model card metadata by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2711
* Add "generated_from_trainer" tag to auto-generated model cards by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2710
* Fix confusing print statement in TranslationEvaluator by NathanS-Git in https://github.com/UKPLab/sentence-transformers/pull/1894
* [`model cards`] Improve the widget example selection: not based on embeddings, better for QA by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2713
* [`model cards`] Replace 'sentence_transformers_model_id' from reused model if possible by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2714
* [`feat`] Allow passing a list of evaluators to the Trainer by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2716
* [`fix`] Fix gradient checkpointing to allow for much lower memory usage by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2717
* [`fix`] Implement `create_model_card` on the Trainer, allowing args.push_to_hub=True by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2718
* [`fix`] Fix `MatryoshkaLoss` crash if the first dimension is not the biggest by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2719
* Update models_en_sentence_embeddings.html by saikartheekb in https://github.com/UKPLab/sentence-transformers/pull/2720
* [`typing`] Improve typing for many functions & add `py.typed` to satisfy `mypy` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2724
* [`fix`] Fix edge case with evaluator being None by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2726
* [`simplify`] Set can_return_loss=True globally, instead of via the data collator by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2727
* [`feat`] Integrate safetensors with Dense, etc. modules too. by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2722
* [`model cards`] Specify the exact dataset size as a tag, will be bucketized by HF by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2728

New Contributors
* higorsilvaa made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2686
* hetulvp made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2701
* NathanS-Git made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/1894
* saikartheekb made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2720

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.0.0...v3.0.1

3.0.0

Not secure
Sentence Transformer training refactor (2449)
The v3.0 release centers around this huge modernization of the training approach for `SentenceTransformer` models. Whereas training before v3.0 used to be all about `InputExample`, `DataLoader` and `model.fit`, the new training approach relies on 5 new components. You can learn more about these components in our [Training and Finetuning Embedding Models with Sentence Transformers v3](https://huggingface.co/blog/train-sentence-transformers) blogpost. Additionally, you can read the new [Training Overview](https://sbert.net/docs/sentence_transformer/training_overview.html), check out the [Training Examples](https://sbert.net/docs/sentence_transformer/training/examples.html), or read this summary:

1. [Dataset](https://sbert.net/docs/sentence_transformer/training_overview.html#dataset)
A training [`Dataset`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset) or [`DatasetDict`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.DatasetDict). This class is much more suited for sharing & efficient modifications than lists/DataLoaders of `InputExample` instances. A `Dataset` can contain multiple text columns that will be fed in order to the corresponding loss function. So, if the loss expects (anchor, positive, negative) triplets, then your dataset should also have 3 columns. The names of these columns are irrelevant. If there is a "label" or "score" column, it is treated separately, and used as the labels during training.
A `DatasetDict` can be used to train with multiple datasets at once, e.g.:
python
DatasetDict({
multi_nli: Dataset({
features: ['premise', 'hypothesis', 'label'],
num_rows: 392702
})
snli: Dataset({
features: ['snli_premise', 'hypothesis', 'label'],
num_rows: 549367
})
stsb: Dataset({
features: ['sentence1', 'sentence2', 'label'],
num_rows: 5749
})
})

When a `DatasetDict` is used, the `loss` parameter to the `SentenceTransformerTrainer` must also be a dictionary with these dataset keys, e.g.:
python
{
'multi_nli': SoftmaxLoss(...),
'snli': SoftmaxLoss(...),
'stsb': CosineSimilarityLoss(...),
}

2. [Loss Function](https://sbert.net/docs/sentence_transformer/training_overview.html#loss-function)
A loss function, or a dictionary of loss functions like described above. These loss functions do not require changes compared to before this PR.
3. [Training Arguments](https://sbert.net/docs/sentence_transformer/training_overview.html#training-arguments)
A SentenceTransformerTrainingArguments instance, subclass of a [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) instance. This powerful class controls the specific details of the training.
4. [Evaluator](https://sbert.net/docs/sentence_transformer/training_overview.html#evaluator)
An optional [`SentenceEvaluator`](https://sbert.net/docs/package_reference/evaluation.html) instance. Unlike before, models can now be evaluated both on an evaluation dataset with some loss function and/or a `SentenceEvaluator` instance.
5. [Trainer](https://sbert.net/docs/sentence_transformer/training_overview.html#trainer)
The new `SentenceTransformersTrainer` instance based on the `transformers` `Trainer`. This instance is provided with a SentenceTransformer model, a SentenceTransformerTrainingArguments class, a SentenceEvaluator, a training and evaluation Dataset/DatasetDict and a loss function/dict of loss functions. Most of these parameters are optional. Once provided, all you have to do is call `trainer.train()`.

Some of the major features that are now implemented include:
* MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
* bf16 training support
* Loss logging
* Evaluation datasets + evaluation loss
* Improved callback support (built-in via Weights and Biases, TensorBoard, CodeCarbon, etc., as well as custom callbacks)
* Gradient checkpointing
* Gradient accumulation
* Improved model card generation
* Warmup ratio
* Pushing to the Hugging Face Hub on every model checkpoint
* Resuming from a training checkpoint
* Hyperparameter Optimization

This script is a minimal example (no evaluator, no training arguments) of training [`mpnet-base`](https://huggingface.co/microsoft/mpnet-base) on a part of the [`all-nli` dataset](https://huggingface.co/datasets/sentence-transformers/all-nli) using [`MultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss):

python
from datasets import load_dataset
from sentence_transformers import SentenceTransformer, SentenceTransformerTrainer
from sentence_transformers.losses import MultipleNegativesRankingLoss

1. Load a model to finetune
model = SentenceTransformer("microsoft/mpnet-base")

2. Load a dataset to finetune on
dataset = load_dataset("sentence-transformers/all-nli", "triplet")
train_dataset = dataset["train"].select(range(10_000))
eval_dataset = dataset["dev"].select(range(1_000))

3. Define a loss function
loss = MultipleNegativesRankingLoss(model)

4. Create a trainer & train
trainer = SentenceTransformerTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=loss,
)
trainer.train()

5. Save the trained model
model.save_pretrained("models/mpnet-base-all-nli")


Additionally, trained models now automatically produce extensive model cards. Each of the following models were trained using some script from the [Training Examples](https://sbert.net/docs/sentence_transformer/training/examples.html), and the model cards were not edited manually whatsoever:
* [tomaarsen/mpnet-base-all-nli-triplet](https://huggingface.co/tomaarsen/mpnet-base-all-nli-triplet)
* [tomaarsen/stsb-distilbert-base-mnrl-cl-multi](https://huggingface.co/tomaarsen/stsb-distilbert-base-mnrl-cl-multi)
* [tomaarsen/distilroberta-base-paraphrases-multi](https://huggingface.co/tomaarsen/distilroberta-base-paraphrases-multi)

Prior to the Sentence Transformer v3 release, all models would be trained using the [`SentenceTransformer.fit`](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.fit) method. Rather than deprecating this method, starting from v3.0, this method will use the [`SentenceTransformerTrainer`](https://sbert.net/docs/package_reference/sentence_transformer/trainer.html#sentence_transformers.trainer.SentenceTransformerTrainer) behind the scenes. This means that your old training code should still work, and should even be upgraded with the new features such as multi-gpu training, loss logging, etc. That said, the new training approach is much more powerful, so it is **recommended** to write new training scripts using the new approach.

Many of the old training scripts were updated to use the new Trainer-based approach, but not all have been updated yet. We accept help via Pull Requests to assist in updating the scripts.

Similarity Score (2615, 2490)

Sentence Transformers v3.0 introduces two new useful methods:
* [similarity](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity)
* [similarity_pairwise](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity_pairwise)

and one property:
* [similarity_fn_name](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.similarity_fn_name)

These can be used to calculate the similarity between embeddings, and to specify which similarity function should be used, for example:

python
>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("all-mpnet-base-v2")
>>> sentences = [
... "The weather is so nice!",
... "It's so sunny outside.",
... "He's driving to the movie theater.",
... "She's going to the cinema.",
... ]
>>> embeddings = model.encode(sentences, normalize_embeddings=True)
>>> model.similarity(embeddings, embeddings)

2.7.0

Not secure
New loss function: CachedGISTEmbedLoss (2592)
For a number of years, [`MultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) (also known as SimCSE, InfoNCE, in-batch negatives loss) has been the state of the art in embedding model training. Notably, this loss function performs better with a larger batch size.

Recently, various improvements have been introduced:
1. [`CachedMultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/losses.html#cachedmultiplenegativesrankingloss) was introduced, which allows you to pick much higher batch sizes (e.g. 65536) with constant memory.
2. [`GISTEmbedLoss`](https://sbert.net/docs/package_reference/losses.html#gistembedloss) takes a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

Now, JacksonCakes has combined these two approaches to produce the best of both worlds: [`CachedGISTEmbedLoss`](https://sbert.net/docs/package_reference/losses.html#cachedgistembedloss). This loss function allows for high batch sizes with constant memory usage, while also using a guide model to assist with the in-batch negative sample selection.

As can be seen in our [Loss Overview](https://sbert.net/docs/training/loss_overview.html), this model should be used with `(anchor, positive)` pairs or `(anchor, positive, negative)` triplets, much like `MultipleNegativesRankingLoss`, `CachedMultipleNegativesRankingLoss`, and `GISTEmbedLoss`. In short, any example using those loss functions can be updated to use `CachedGISTEmbedLoss`! Feel free to experiment, e.g. with [this training script](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v3.py).

Automatic Matryoshka model truncation (2573)
Sentence Transformers v2.4.0 introduced Matryoshka models: models whose embeddings are still useful after truncation. Since then, [many](https://huggingface.co/BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k) [useful](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) [Matryoshka](https://huggingface.co/NeuML/pubmedbert-base-embeddings-matryoshka) [models](https://huggingface.co/mixedbread-ai/mxbai-embed-2d-large-v1) have been trained.

As of this release, the truncation for these Matryoshka embedding models can be done automatically via a new `truncate_dim` constructor argument:
python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

matryoshka_dim = 64
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True, truncate_dim=matryoshka_dim)

embeddings = model.encode(
[
"search_query: What is TSNE?",
"search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
"search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
]
)
print(embeddings.shape)
=> [3, 64]

similarities = cos_sim(embeddings[0], embeddings[1:])

Page 11 of 24

ยฉ 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.