Sentence-transformers

Latest version: v2.7.0

Safety actively analyzes 622080 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 9

2.7.0

New loss function: CachedGISTEmbedLoss (2592)
For a number of years, [`MultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) (also known as SimCSE, InfoNCE, in-batch negatives loss) has been the state of the art in embedding model training. Notably, this loss function performs better with a larger batch size.

Recently, various improvements have been introduced:
1. [`CachedMultipleNegativesRankingLoss`](https://sbert.net/docs/package_reference/losses.html#cachedmultiplenegativesrankingloss) was introduced, which allows you to pick much higher batch sizes (e.g. 65536) with constant memory.
2. [`GISTEmbedLoss`](https://sbert.net/docs/package_reference/losses.html#gistembedloss) takes a guide model to guide the in-batch negative sample selection. This prevents false negatives, resulting in a stronger training signal.

Now, JacksonCakes has combined these two approaches to produce the best of both worlds: [`CachedGISTEmbedLoss`](https://sbert.net/docs/package_reference/losses.html#cachedgistembedloss). This loss function allows for high batch sizes with constant memory usage, while also using a guide model to assist with the in-batch negative sample selection.

As can be seen in our [Loss Overview](https://sbert.net/docs/training/loss_overview.html), this model should be used with `(anchor, positive)` pairs or `(anchor, positive, negative)` triplets, much like `MultipleNegativesRankingLoss`, `CachedMultipleNegativesRankingLoss`, and `GISTEmbedLoss`. In short, any example using those loss functions can be updated to use `CachedGISTEmbedLoss`! Feel free to experiment, e.g. with [this training script](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v3.py).

Automatic Matryoshka model truncation (2573)
Sentence Transformers v2.4.0 introduced Matryoshka models: models whose embeddings are still useful after truncation. Since then, [many](https://huggingface.co/BEE-spoke-data/bert-plus-L8-v1.0-syntheticSTS-4k) [useful](https://huggingface.co/nomic-ai/nomic-embed-text-v1.5) [Matryoshka](https://huggingface.co/NeuML/pubmedbert-base-embeddings-matryoshka) [models](https://huggingface.co/mixedbread-ai/mxbai-embed-2d-large-v1) have been trained.

As of this release, the truncation for these Matryoshka embedding models can be done automatically via a new `truncate_dim` constructor argument:
python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

matryoshka_dim = 64
model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True, truncate_dim=matryoshka_dim)

embeddings = model.encode(
[
"search_query: What is TSNE?",
"search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
"search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
]
)
print(embeddings.shape)
=> [3, 64]

similarities = cos_sim(embeddings[0], embeddings[1:])

2.6.1

This is a patch release to fix a bug in [`semantic_search_faiss`](https://sbert.net/docs/package_reference/quantization.html#sentence_transformers.quantization.semantic_search_faiss) and [`semantic_search_usearch`](https://sbert.net/docs/package_reference/quantization.html#sentence_transformers.quantization.semantic_search_usearch) that caused the scores to not correspond to the returned corpus indices. Additionally, you can now evaluate embedding models after quantizing their embeddings.

Precision support in EmbeddingSimilarityEvaluator
You can now pass `precision` to the `EmbeddingSimilarityEvaluator` to evaluate the performance after quantization:

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import EmbeddingSimilarityEvaluator, SimilarityFunction
import datasets

model = SentenceTransformer("all-mpnet-base-v2")

stsb = datasets.load_dataset("mteb/stsbenchmark-sts", split="test")

print("Spearman correlation based on Cosine Similarity on the STS Benchmark test set:")
for precision in ["float32", "uint8", "int8", "ubinary", "binary"]:
evaluator = EmbeddingSimilarityEvaluator(
stsb["sentence1"],
stsb["sentence2"],
[score / 5 for score in stsb["score"]],
main_similarity=SimilarityFunction.COSINE,
name="sts-test",
precision=precision,
)
print(precision, evaluator(model))

Spearman correlation based on Cosine Similarity on the STS Benchmark test set:

2.6.0

Embedding Quantization
Embeddings may be challenging to scale up, which leads to expensive solutions and high latencies. However, there is a new approach to counter this problem; it entails reducing the size of each of the individual values in the embedding: **Quantization**. Experiments on quantization have shown that we can maintain a large amount of performance while significantly speeding up computation and saving on memory, storage, and costs.

To be specific, using binary quantization may result in retaining 96% of the retrieval performance, while speeding up retrieval by **25x** and saving on memory & disk space with **32x**. Do not underestimate this approach! Read more about Embedding Quantization in our extensive [blogpost](https://huggingface.co/blog/embedding-quantization).

Binary and Scalar Quantization
Two forms of quantization exist at this time: binary and scalar (int8). These quantize embedding values from `float32` into `binary` and `int8`, respectively. For Binary quantization, you can use the following snippet:

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.quantization import quantize_embeddings

1. Load an embedding model
model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1")

2.5.1

This is a patch release to fix a bug in `CrossEncoder.rank` that caused the last value to be discarded when using the default `top_k=-1`.

`CrossEncoder.rank` patch:

python
from sentence_transformers.cross_encoder import CrossEncoder

Pre-trained cross encoder
model = CrossEncoder("cross-encoder/stsb-distilroberta-base")

We want to compute the similarity between the query sentence
query = "A man is eating pasta."

With all sentences in the corpus
corpus = [
"A man is eating food.",
"A man is eating a piece of bread.",
"The girl is carrying a baby.",
"A man is riding a horse.",
"A woman is playing violin.",
"Two men pushed carts through the woods.",
"A man is riding a white horse on an enclosed ground.",
"A monkey is playing drums.",
"A cheetah is running behind its prey.",
]

We rank all sentences in the corpus for the query
ranks = model.rank(query, corpus)

Print the scores
print("Query:", query)
for rank in ranks:
print(f"{rank['score']:.2f}\t{corpus[rank['corpus_id']]}")

Query: A man is eating pasta.
0.67 A man is eating food.
0.34 A man is eating a piece of bread.
0.08 A man is riding a horse.
0.07 A man is riding a white horse on an enclosed ground.
0.01 The girl is carrying a baby.
0.01 Two men pushed carts through the woods.
0.01 A monkey is playing drums.
0.01 A woman is playing violin.
0.01 A cheetah is running behind its prey.

Previously, the lowest score document would be removed from the output.

All changes
* [`examples`] Update model repo_id in 2dMatryoshka example by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2515
* [`feat`] Add get_config_dict to new Matryoshka2dLoss & AdaptiveLayerLoss by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2516
* [`chore`] Update to ruff 0.3.0; update ruff.toml by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2517
* [`example`] Don't always normalize the embeddings in clustering example by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2520
* Fix CrossEncoder.rank default value for `top_k` by xenova in https://github.com/UKPLab/sentence-transformers/pull/2518

New Contributors
* xenova made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2518

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v2.5.0...v2.5.1

2.5.0

2D Matryoshka & Adaptive Layer models (2506)
Embedding models are often encoder models with numerous layers, such as 12 (e.g. [all-mpnet-base-v2](https://huggingface.co/sentence-transformers/all-mpnet-base-v2)) or 6 (e.g. [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)). To get embeddings, every single one of these layers must be traversed. [2D Matryoshka Sentence Embeddings](https://arxiv.org/abs/2402.14776) (2DMSE) revisits this concept by proposing an approach to train embedding models that will perform well when only using a selection of all layers. This results in faster inference speeds at relatively low performance costs.

For example, using Sentence Transformers, you can train an Adaptive Layer model that can be sped up by 2x at a 15% reduction in performance, or 5x on GPU & 10x on CPU for a 20% reduction in performance. The 2DMSE paper highlights scenarios where this is superior to using a smaller model.

Training

Training with Adaptive Layer support is quite elementary: rather than applying some loss function on only the last layer, we also apply that same loss function on the pooled embeddings from previous layers. Additionally, we employ a KL-divergence loss that aims to make the embeddings of the non-last layers match that of the last layer. This can be seen as a fascinating approach of [knowledge distillation](https://sbert.net/examples/training/distillation/README.html#knowledge-distillation), but with the last layer as the teacher model and the prior layers as the student models.

For example, with the 12-layer [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base), it will now be trained such that the model produces meaningful embeddings after each of the 12 layers.

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import CoSENTLoss, AdaptiveLayerLoss

model = SentenceTransformer("microsoft/mpnet-base")

base_loss = CoSENTLoss(model=model)
loss = AdaptiveLayerLoss(model=model, loss=base_loss)

* **Reference**: <a href="https://sbert.net/docs/package_reference/losses.html#adaptivelayerloss"><code>AdaptiveLayerLoss</code></a>

Additionally, this can be combined with the `MatryoshkaLoss` such that the resulting model can be reduced both in the number of layers, but also in the size of the output dimensions. See also the [Matryoshka Embeddings](https://sbert.net/examples/training/matryoshka/README.html) for more information on reducing output dimensions. In Sentence Transformers, the combination of these two losses is called `Matryoshka2dLoss`, and a shorthand is provided for simpler training.

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import CoSENTLoss, Matryoshka2dLoss

model = SentenceTransformer("microsoft/mpnet-base")

base_loss = CoSENTLoss(model=model)
loss = Matryoshka2dLoss(model=model, loss=base_loss, matryoshka_dims=[768, 512, 256, 128, 64])

* **Reference**: <a href="https://sbert.net/docs/package_reference/losses.html#matryoshka2dloss"><code>Matryoshka2dLoss</code></a>

<details><summary>Performance Results</summary>

Results

Let's look at the performance that we may be able to expect from an Adaptive Layer embedding model versus a regular embedding model. For this experiment, I have trained two models:

* [tomaarsen/mpnet-base-nli-adaptive-layer](https://huggingface.co/tomaarsen/mpnet-base-nli-adaptive-layer): Trained by running [adaptive_layer_nli.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/adaptive_layer/adaptive_layer_nli.py) with [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base).
* [tomaarsen/mpnet-base-nli](https://huggingface.co/tomaarsen/mpnet-base-nli): A near identical model as the former, but using only `MultipleNegativesRankingLoss` rather than `AdaptiveLayerLoss` on top of `MultipleNegativesRankingLoss`. I also use [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) as the base model.

Both of these models were trained on the AllNLI dataset, which is a concatenation of the [SNLI](https://huggingface.co/datasets/snli) and [MultiNLI](https://huggingface.co/datasets/multi_nli) datasets. I have evaluated these models on the [STSBenchmark](https://huggingface.co/datasets/mteb/stsbenchmark-sts) test set using multiple different embedding dimensions. The results are plotted in the following figure:

![adaptive_layer_results](https://huggingface.co/tomaarsen/mpnet-base-nli-adaptive-layer/resolve/main/adaptive_layer_results.png)

The first figure shows that the Adaptive Layer model stays much more performant when reducing the number of layers in the model. This is also clearly shown in the second figure, which displays that 80% of the performance is preserved when the number of layers is reduced all the way to 1.

Lastly, the third figure shows the expected speedup ratio for GPU & CPU devices in my tests. As you can see, removing half of the layers results in roughly a 2x speedup, at a cost of ~15% performance on STSB (~86 -> ~75 Spearman correlation). When removing even more layers, the performance benefit gets larger for CPUs, and between 5x and 10x speedups are very feasible with a 20% loss in performance.

</details>

<details><summary>Inference</summary>

Inference

After a model has been trained using the Adaptive Layer loss, you can then truncate the model layers to your desired layer count. Note that this requires doing a bit of surgery on the model itself, and each model is structured a bit differently, so the steps are slightly different depending on the model.

First of all, we will load the model & access the underlying `transformers` model like so:

python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("tomaarsen/mpnet-base-nli-adaptive-layer")

We can access the underlying model with `model[0].auto_model`
print(model[0].auto_model)

MPNetModel(
(embeddings): MPNetEmbeddings(
(word_embeddings): Embedding(30527, 768, padding_idx=1)
(position_embeddings): Embedding(514, 768, padding_idx=1)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(encoder): MPNetEncoder(
(layer): ModuleList(
(0-11): 12 x MPNetLayer(
(attention): MPNetAttention(
(attn): MPNetSelfAttention(
(q): Linear(in_features=768, out_features=768, bias=True)
(k): Linear(in_features=768, out_features=768, bias=True)
(v): Linear(in_features=768, out_features=768, bias=True)
(o): Linear(in_features=768, out_features=768, bias=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(intermediate): MPNetIntermediate(
(dense): Linear(in_features=768, out_features=3072, bias=True)
(intermediate_act_fn): GELUActivation()
)
(output): MPNetOutput(
(dense): Linear(in_features=3072, out_features=768, bias=True)
(LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
)
)
(relative_attention_bias): Embedding(32, 12)
)
(pooler): MPNetPooler(
(dense): Linear(in_features=768, out_features=768, bias=True)
(activation): Tanh()
)
)

This output will differ depending on the model. We will look for the repeated layers in the encoder. For this MPNet model, this is stored under `model[0].auto_model.encoder.layer`. Then we can slice the model to only keep the first few layers to speed up the model:

python

new_num_layers = 3
model[0].auto_model.encoder.layer = model[0].auto_model.encoder.layer[:new_num_layers]

Then we can run inference with it using <a href="https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformers.encode</code></a>.

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer("tomaarsen/mpnet-base-nli-adaptive-layer")
new_num_layers = 3
model[0].auto_model.encoder.layer = model[0].auto_model.encoder.layer[:new_num_layers]

embeddings = model.encode(
[
"The weather is so nice!",
"It's so sunny outside!",
"He drove to the stadium.",
]
)
Similarity of the first sentence with the other two
similarities = cos_sim(embeddings[0], embeddings[1:])

2.4.0

MatryoshkaLoss (2485)
Dense embedding models typically produce embeddings with a fixed size, such as 768 or 1024. All further computations (clustering, classification, semantic search, retrieval, reranking, etc.) must then be done on these full embeddings. [Matryoshka Representation Learning](https://arxiv.org/abs/2205.13147) revisits this idea, and proposes a solution to train embedding models whose embeddings are still useful after truncation to much smaller sizes. This allows for considerably faster (bulk) processing.

Training

Training using Matryoshka Representation Learning (MRL) is quite elementary: rather than applying some loss function on only the full-size embeddings, we also apply that same loss function on truncated portions of the embeddings. For example, if a model has an embedding dimension of 768 by default, it can now be trained on 768, 512, 256, 128, 64 and 32. Each of these losses will be added together, optionally with some weight:

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.losses import CoSENTLoss, MatryoshkaLoss

model = SentenceTransformer("microsoft/mpnet-base")

base_loss = CoSENTLoss(model=model)
loss = MatryoshkaLoss(model=model, loss=base_loss, matryoshka_dims=[768, 512, 256, 128, 64])

* **Reference**: <a href="https://sbert.net/docs/package_reference/losses.html#matryoshkaloss"><code>MatryoshkaLoss</code></a>

<details><summary>Inference</summary>

Inference

After a model has been trained using a Matryoshka loss, you can then run inference with it using <a href="https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode"><code>SentenceTransformers.encode</code></a>. You must then truncate the resulting embeddings, and it is recommended to renormalize the embeddings.

python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim
import torch.nn.functional as F

model = SentenceTransformer("nomic-ai/nomic-embed-text-v1.5", trust_remote_code=True)

matryoshka_dim = 64
embeddings = model.encode(
[
"search_query: What is TSNE?",
"search_document: t-distributed stochastic neighbor embedding (t-SNE) is a statistical method for visualizing high-dimensional data by giving each datapoint a location in a two or three-dimensional map.",
"search_document: Amelia Mary Earhart was an American aviation pioneer and writer.",
]
)
embeddings = embeddings[..., :matryoshka_dim] Shrink the embedding dimensions

similarities = cos_sim(embeddings[0], embeddings[1:])

Page 1 of 9

Releases

Has known vulnerabilities

Sentence-transformers

Page 1 of 9

2.7.0

2.6.1

2.6.0

2.5.1

2.5.0

2.4.0

Page 1 of 9

Links

Releases