embeddings = model.encode(["I am driving to the lake.", "It is a beautiful day."])
binary_embeddings = quantize_embeddings(embeddings, precision="binary")
References:
* [SentenceTransformer.encode](https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode)
* [quantize_embeddings](https://sbert.net/docs/package_reference/quantization.html#sentence_transformers.quantization.quantize_embeddings)
GISTEmbedLoss
GISTEmbedLoss, as introduced in [Solatorio (2024)](https://arxiv.org/pdf/2402.16829.pdf), is a guided variant of the more standard in-batch negatives (`MultipleNegativesRankingLoss`) loss. Both loss functions are provided with a list of (anchor, positive) pairs, but while `MultipleNegativesRankingLoss` uses `anchor_i` and `positive_i` as positive pair and all `positive_j` with `i != j` as negative pairs, `GISTEmbedLoss` uses a second model to guide the in-batch negative sample selection.
This can be very useful, because it is plausible that `anchor_i` and `positive_j` are actually quite semantically similar. In this case, `GISTEmbedLoss` would not consider them a negative pair, while `MultipleNegativesRankingLoss` would. When finetuning MPNet-base on the AllNLI dataset, these are the Spearman correlation based on cosine similarity using the STS Benchmark dev set (higher is better):
![312039399-ef5d4042-a739-41f6-a6ca-eddc7f901411](https://github.com/UKPLab/sentence-transformers/assets/37621491/ae99e809-4cc9-4db3-8b00-94cc74d2fe3b)
The blue line is `MultipleNegativesRankingLoss`, whereas the grey line is `GISTEmbedLoss` with the small `all-MiniLM-L6-v2` as the guide model. Note that `all-MiniLM-L6-v2` by itself does not reach 88 Spearman correlation on this dataset, so this is really the effect of two models (`mpnet-base` and `all-MiniLM-L6-v2`) reaching a performance that they could not reach separately.
Soft `save_to_hub` Deprecation
Most codebases that allow for pushing models to the [Hugging Face Hub](https://huggingface.co/) adopt a `push_to_hub` method instead of a `save_to_hub` method, and now Sentence Transformers will follow that convention. The [`push_to_hub`](https://sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.push_to_hub) method will now be the recommended approach, although `save_to_hub` will continue to exist for the time being: it will simply call `push_to_hub` internally.
python
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("all-mpnet-base-v2")
...
Train the model
model.fit(
train_objectives=[(train_dataloader, train_loss)],
evaluator=dev_evaluator,
epochs=num_epochs,
evaluation_steps=1000,
warmup_steps=warmup_steps,
)
Push the model to Hugging Face
model.push_to_hub("tomaarsen/mpnet-base-nli-stsb")
All changes
* Add GISTEmbedLoss by avsolatorio in https://github.com/UKPLab/sentence-transformers/pull/2535
* [`feat`] Add 'get_config_dict' method to GISTEmbedLoss for better model cards by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2543
* Enable saving modules as pytorch_model.bin by CKeibel in https://github.com/UKPLab/sentence-transformers/pull/2542
* [`deprecation`] Deprecate `save_to_hub` in favor of `push_to_hub`; add safe_serialization support to `push_to_hub` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2544
* Fix SentenceTransformer encode documentation return type default (numpy vectors) by CKeibel in https://github.com/UKPLab/sentence-transformers/pull/2546
* [`docs`] Update return docstring of encode_multi_process by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2548
* [`feat`] Add binary & scalar embedding quantization support to Sentence Transformers by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2549
New Contributors
* avsolatorio made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2535
* CKeibel made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2542
**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v2.5.1...v2.6.0