Sentence-transformers

Latest version: v3.3.1

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 16 of 23

0.8084508771660436

</details>

* API Reference: [`NanoBEIREvaluator`](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#nanobeirevaluator)

PEFT compatibility (https://github.com/UKPLab/sentence-transformers/pull/3000, https://github.com/UKPLab/sentence-transformers/pull/2980, https://github.com/UKPLab/sentence-transformers/pull/3046)
Sentence Transformers has been integrated much more closely with PEFT. Notably, we introduce new methods:
* [active_adapters](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.active_adapters)
* [add_adapter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.add_adapter)
* [disable_adapters](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.disable_adapters)
* [enable_adapters](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.enable_adapters)
* [get_adapter_state_dict](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.get_adapter_state_dict)
* [load_adapter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.load_adapter)
* [set_adapter](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.set_adapter)

These methods allow you to add new PEFT adapters or load pretrained ones, for example:

Adding a adapter
python
from sentence_transformers import SentenceTransformer

1. Load a model to finetune with 2. (Optional) model card data
model = SentenceTransformer(
"all-MiniLM-L6-v2",
model_card_data=SentenceTransformerModelCardData(
language="en",
license="apache-2.0",
model_name="all-MiniLM-L6-v2 adapter finetuned on GooAQ pairs",
),
)

2. Create a LoRA adapter for the model & add it
peft_config = LoraConfig(
task_type=TaskType.FEATURE_EXTRACTION,
inference_mode=False,
r=8,
lora_alpha=32,
lora_dropout=0.1,
)
model.add_adapter(peft_config)

Proceed as usual... See https://sbert.net/docs/sentence_transformer/training_overview.html


Loading a pretrained adapter
Given [sentence-transformers-testing/stsb-bert-tiny-lora](https://huggingface.co/sentence-transformers-testing/stsb-bert-tiny-lora) as a small adapter model (the `adapter_model.safetensors` file is only 33.8kB!) on top of [sentence-transformers-testing/stsb-bert-tiny-safetensors](https://huggingface.co/sentence-transformers-testing/stsb-bert-tiny-safetensors), you can either load this adapter directly:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)
(2, 128)

Or you can load the original model and load the adapter into it:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers-testing/stsb-bert-tiny-safetensors")
model.load_adapter("sentence-transformers-testing/stsb-bert-tiny-lora")
embeddings = model.encode(["This is an example sentence", "Each sentence is converted"])
print(embeddings.shape)
(2, 128)


Transformers v4.46.0 compatibility (https://github.com/UKPLab/sentence-transformers/pull/3026, https://github.com/UKPLab/sentence-transformers/pull/3035, https://github.com/UKPLab/sentence-transformers/pull/3037, https://github.com/UKPLab/sentence-transformers/pull/3038)
The recent `transformers` v4.46.0 update introduced a few changes that were incompatible with Sentence Transformers. For example:
* Use "processing_class" argument instead of "tokenizers"
* Add a `num_items_in_batch` argument to the `compute_loss` method in the Trainer
* Adding a `ValueError` if `eval_dataset` is None while `eval_strategy` is not `"no"` (this should be possible in Sentence Transformers, as we accept evaluating with just an `evaluator` as well)

These issues and deprecation warnings have been resolved.

Drop Python 3.8 support (https://github.com/UKPLab/sentence-transformers/pull/3033)
Given that Python 3.8 has now reached it's end of life, Sentence Transformers will no longer support it.

All Changes
* [`peft`] If AutoModel is wrapped with PEFT for prompt learning, then extend the attention mask by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3000
* [`integration`] Add support for Transformers v4.46.0 by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3026
* add an ImportError to tell the user that `datasets` must be install to fit a model by h4c5 in https://github.com/UKPLab/sentence-transformers/pull/3020
* [`feat`] Integrate NanoBeIR datasets; use `model.similarity` by default in evaluators by ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2966
* Fix model name typo in example by programmer-ke in https://github.com/UKPLab/sentence-transformers/pull/3028
* Support OpenVINO int8 static quantization by l-bat in https://github.com/UKPLab/sentence-transformers/pull/3025
* [`fix`] Avoid passing eval_dataset=None to transformers due to >=v4.46.0 crash by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3035
* [`docs`] Update the dated example in the NanoBEIREvaluator by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3034
* [`deprecate`] Drop Python 3.8 support due to EOL by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3033
* [`tests`] Remove evaluation_steps from model.fit test without evaluator by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3037
* [`fix`] Fix loading pre-exported OV/ONNX model if export=False by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3036
* [`chore`] If Transformers 4.46.0, use processing_class instead of tokenizer when saving by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3038
* [`docs`] Add some missing docs for include_prompt in Pooling by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3042
* [`feat`] Trainer with prompts and prompt masking by ArthurCamara in https://github.com/UKPLab/sentence-transformers/pull/2964
* [fix] Fix model loading inconsistency after Peft training by using PeftModel by pesuchin in https://github.com/UKPLab/sentence-transformers/pull/2980
* [`enh`] Add Support for multiple adapters on Transformers-based models by carlesonielfa in https://github.com/UKPLab/sentence-transformers/pull/3046 & https://github.com/UKPLab/sentence-transformers/pull/2993
* Moved Model Card Callback init in Trainer to a separate function by tRosenflanz in https://github.com/UKPLab/sentence-transformers/pull/3047

New Contributors
* h4c5 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3020
* programmer-ke made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3028
* l-bat made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3025
* carlesonielfa made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3046
* tRosenflanz made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3047

Special Thanks
Big thanks to ArthurCamara for leading the work on both 1) training with prompts and 2) NanoBEIR.

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.2.1...v3.3.0

0.5758124378869705

<details><summary>Advanced Usage</summary>

You can also specify a subset of datasets, and you can specify query and/or corpus prompts, if your model uses them. For example:

python
import logging
from sentence_transformers import SentenceTransformer
from sentence_transformers.evaluation import NanoBEIREvaluator

Optional, but nice to get human-readable results in the terminal
logging.basicConfig(
format="%(asctime)s - %(message)s", datefmt="%Y-%m-%d %H:%M:%S", level=logging.INFO
)

model = SentenceTransformer('intfloat/multilingual-e5-large-instruct')

datasets = ["QuoraRetrieval", "MSMARCO"]
query_prompts = {
"QuoraRetrieval": "Instruct: Given a question, retrieve questions that are semantically equivalent to the given question\\nQuery: ",
"MSMARCO": "Instruct: Given a web search query, retrieve relevant passages that answer the query\\nQuery: "
}

evaluator = NanoBEIREvaluator(
dataset_names=datasets,
query_prompts=query_prompts,
)

results = evaluator(model)
'''
NanoBEIR Evaluation of the model on ['QuoraRetrieval', 'MSMARCO'] dataset:
Evaluating NanoQuoraRetrieval
Information Retrieval Evaluation of the model on the NanoQuoraRetrieval dataset:
Queries: 50
Corpus: 5046

Score-Function: cosine

0.9973

The `similarity_fn_name` can now be specified via the [`SentenceTransformer`](https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer) like so:
python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/multi-qa-mpnet-base-dot-v1", similarity_fn_name="dot")

Valid options include "cosine" (default), "dot", "euclidean", "manhattan". The chosen `similarity_fn_name` will also be saved into the model configuration, and loaded automatically. For example, the [`msmarco-distilbert-dot-v5`](https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5) model was trained to work best with `dot`, so we've configured it to use that `similarity_fn_name` in its [configuration](https://huggingface.co/sentence-transformers/msmarco-distilbert-dot-v5/blob/main/config_sentence_transformers.json#L9):

python
>>> from sentence_transformers import SentenceTransformer
>>> model = SentenceTransformer("sentence-transformers/msmarco-distilbert-dot-v5")
>>> model.similarity_fn_name
'dot'


- Docs: [Semantic Textual Similarity > Similarity Calculation](https://sbert.net/docs/sentence_transformer/usage/semantic_textual_similarity.html)

Big thanks to ir2718 for helping set up this major feature.

Allow passing `model_kwargs`, `tokenizer_kwargs`, and `config_kwargs` to `SentenceTransformer` (2578)

To those familiar with the internals of Sentence Transformers, you might know that internally, we call [`AutoModel.from_pretrained`](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained), [`AutoTokenizer.from_pretrained`](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoTokenizer.from_pretrained) and [`AutoConfig.from_pretrained`](https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoConfig.from_pretrained) from `transformers`.
Each of these are rather powerful, and they are constantly improved with new features. For example, the `AutoModel` keyword arguments include:
* [`torch_dtype`](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained.torch_dtype) - this allows you to immediately load a model in `bfloat16` or `float16` (or `"auto"`, i.e. whatever the model was stored in), which can speed up inference a lot.
* [`quantization_config`](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained.quantization_config)
* [`attn_implementation`](https://huggingface.co/docs/transformers/en/main_classes/model#transformers.PreTrainedModel.from_pretrained.attn_implementation) - all models support "eager", but some also support the much faster "fa2" (Flash Attention 2) and "sdpa" (Scaled Dot Product Attention).

These options allow for speeding up the model inference. Additionally, via `AutoConfig` you can update the model configuration, e.g. updating the dropout probability during training, and with `AutoTokenizer` you can disable the fast Rust-based tokenizer if you're having issues with it via `use_fast=False`.

Due to how useful these options can be, the following arguments are added to `SentenceTransformer`:
* `model_kwargs` for `AutoModel.from_pretrained` keyword arguments
* `tokenizer_kwargs` for `AutoTokenizer.from_pretrained` keyword arguments
* `config_kwargs` for `AutoConfig.from_pretrained` keyword arguments

You can use it like so:

python
from sentence_transformers import SentenceTransformer
import torch

model = SentenceTransformer(
"mixedbread-ai/mxbai-embed-large-v1",
model_kwargs={"torch_dtype": torch.bfloat16, "attn_implementation": "sdpa"},
config_kwargs={"hidden_dropout_prob": 0.3},
)
embeddings = model.encode(["He drove his yellow car to the beach.", "He played football with his friends."])
print(embeddings.shape)


Big thanks to satyamk7054 for starting this work.

Hyperparameter Optimization (2655)

Sentence Transformers v3.0 introduces Hyperparameter Optimization (HPO) by extending the `transformers` HPO support. We recommend reading the all new [Hyperparameter Optimization](https://sbert.net/examples/training/hpo/README.html) for many more details.

Datasets Release

Alongside Sentence Transformers v3.0, we reformat and release 50+ useful datasets in our [Embedding Model Datasets](https://huggingface.co/collections/sentence-transformers/embedding-model-datasets-6644d7a3673a511914aa7552) Collection on Hugging Face. These can be used with at least one loss function in Sentence Transformers v3.0 out of the box. We recommend browsing through these to see if there are datasets akin to your use cases - training a model on them might just produce large gains on your task(s).

MSELoss extension (2641)

The [MSELoss](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss) now accepts multiple text columns for each label (where each label is a target/gold embedding), rather than only accepting one text column. This is extremely powerful for following the excellent [Multilingual Models](https://sbert.net/examples/training/multilingual/README.html) strategy to convert a monolingual model into a multilingual one. You can now conveniently train both English and (identical but translated) non-English texts to represent the same embedding (that was generated by a powerful English embedding model).

Add `local_files_only` argument to SentenceTransformer & CrossEncoder (2603)

You can now initialize a `SentenceTransformer` and `CrossEncoder` with `local_files_only`. If `True`, then it will not try and download a model from Hugging Face, it will only look in the local filesystem for the model or try and load it from a cache.

Thanks debanjum for this change.

All changes
* Minor grammar fix in GPL paragraph by mauricesvp in https://github.com/UKPLab/sentence-transformers/pull/2604
* [feat] Add local_files_only argument to load model from cache by debanjum in https://github.com/UKPLab/sentence-transformers/pull/2603
* Fix broken links by mauricesvp in https://github.com/UKPLab/sentence-transformers/pull/2611
* Updated urls for msmarco dataset by j-dominguez9 in https://github.com/UKPLab/sentence-transformers/pull/2609
* [`v3`] Training refactor - MultiGPU, loss logging, bf16, etc. by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2449
* [`v3`] Add `similarity` and `similarity_pairwise` methods to Sentence Transformers by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2615
* [`v3`] Fix various model card errors by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2616
* [`v3`] Fix trainer `compute_loss` when evaluating/predicting if the `loss` updated the inputs in-place by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2617
* [`v3`] Never return None in infer_datasets, could result in crash by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2620
* [`v3`] Trainer: Implement resume from checkpoint support by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2621
* Fall back to CPU device in case there are no PyTorch parameters by maxfriedrich in https://github.com/UKPLab/sentence-transformers/pull/2614
* Add `trust_remote_code` to `CrossEncoder.tokenizer` by michaelfeil in https://github.com/UKPLab/sentence-transformers/pull/2623
* [`v3`] Update example scripts to the new v3 training format by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2622
* Bug in DenoisingAutoEncoderLoss.py by arun477 in https://github.com/UKPLab/sentence-transformers/pull/2619
* [`v3`] Remove "return_outputs" as it's not strictly necessary. Avoids OOM & speeds up training by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2633
* [`v3`] Fix crash from inferring the dataset_id from a local dataset by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2636
* Enable Sentence Transformer Inference with Intel Gaudi2 GPU Supported ( 'hpu' ) - Follow up for 2557 by ZhengHongming888 in https://github.com/UKPLab/sentence-transformers/pull/2630
* [`v3`] Fix multilingual conversion script; extend MSELoss to multi-column by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2641
* [`v3`] Update evaluation scripts to use HF Datasets by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2642
* Use `b1` quantization for USearch by ashvardanian in https://github.com/UKPLab/sentence-transformers/pull/2644
* [`v3`] Fix `resume_from_checkpoint` by also updating the loss model by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2648
* [`v3`] Fix backwards pass on MSELoss due to in-place update by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2647
* [`v3`] Simplify `load_from_checkpoint` using `load_state_dict` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2650
* [`v3`] Use `torch.arange` instead of `torch.tensor(range(...))` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2651
* [`v3`] Resolve inplace modification error in DDP by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2654
* [`v3`] Add hyperparameter optimization support by letting `loss` be a Callable that accepts a `model` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2655
* [`v3`] Add tag hinting at the number of training samples by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2660
* Allow passing 'precision' when using 'encode_multi_process' to SentenceTransformer by ariel-talent-fabric in https://github.com/UKPLab/sentence-transformers/pull/2659
* Allow passing model_args to ST by satyamk7054 in https://github.com/UKPLab/sentence-transformers/pull/2578
* Fix smart_batching_collate Inefficiency by PrithivirajDamodaran in https://github.com/UKPLab/sentence-transformers/pull/2556
* [`v3`] For the Cached losses; ignore gradients if grad is disabled (e.g. eval) by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2668
* [`docs`] Rewrite the https://sbert.net documentation by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2632
* [`v3`] Chore - include import sorting in ruff by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2672
* [`v3`] Prevent warning with 'model.fit' with transformers >= 4.41.0 due to evaluation_strategy by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2673
* [`v3`] Add various useful Sphinx packages (copy code, link to code, nicer tabs) by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2674
* [`v3`] Make the "primary_metric" for evaluators a bit more robust by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2675
* [`v3`] Set `broadcast_buffers = False` when training with DDP by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2663
* [`v3`] Warn about using DP instead of DDP + set dataloader_drop_last with DDP by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2677
* [`v3`] Add warning that Evaluators only run on 1 GPU when multi-GPU training by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2678
* [`v3`] Move training dependencies into a "train" extra by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2676
* [`v3`] Docs: update references to the API reference by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2679
* [`v3`] Add "dataset_size:" to the tag denoting the number of training samples by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/2680

New Contributors
* mauricesvp made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2604
* debanjum made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2603
* j-dominguez9 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2609
* michaelfeil made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2623
* arun477 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2619
* ashvardanian made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2644
* ariel-talent-fabric made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2659
* satyamk7054 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2578
* PrithivirajDamodaran made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2556

A special shoutout to Jakobhenningjensen, smerrill, b5y, ScottishFold007, pszemraj, bwanglzu, igorkurinnyi, for experimenting with the v3.0 release prior to release and matthewfranglen for the initial work on the training refactor back in October of 2022 in 1733.

cc AlexJonesNLP as I know you are interested in this release!

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v2.7.0...v3.0.0

0.9597

0.9540

0.9395

Evaluating NanoMSMARCO
Information Retrieval Evaluation of the model on the NanoMSMARCO dataset:
Queries: 50
Corpus: 5043

Score-Function: cosine

Page 16 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.