Sentence-transformers

Latest version: v3.3.1

Safety actively analyzes 681881 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 14 of 23

1.3184

[-0.7437, -0.0000, -1.3702, -1.3320],
[-1.3935, -1.3702, -0.0000, -0.9973],
[-1.3184, -1.3320, -0.9973, -0.0000]])

Additionally, you can compute the similarity between pairs of embeddings, resulting in a 1-dimensional vector of similarities rather than a 2-dimensional matrix:
python
>>> model = SentenceTransformer("all-mpnet-base-v2")
>>> sentences = [
... "The weather is so nice!",
... "It's so sunny outside.",
... "He's driving to the movie theater.",
... "She's going to the cinema.",
... ]
>>> embeddings = model.encode(sentences, normalize_embeddings=True)
>>> model.similarity_pairwise(embeddings[::2], embeddings[1::2])

1.2.1

Not secure
Final release of version 1: Makes v1 of sentence-transformers forward compatible with models from version 2 of sentence-transformers.

1.2.0

Not secure
Unsupervised Sentence Embedding Learning

New methods integrated to train sentence embedding models without labeled data. See [Unsupervised Learning](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning) for an overview of all existent methods.

New methods:
- **[CT](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/CT)**: Integration of [Semantic Re-Tuning With Contrastive Tension (CT)](https://openreview.net/pdf?id=Ov_sMNau-PF) to tune models without labeled data
- **[CT_In-Batch_Negatives](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/CT_In-Batch_Negatives)**: A modification of CT using in-batch negatives
- **[SimCSE](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/SimCSE)**: An unsupervised sentence embedding learning method by [Gao et al.](https://arxiv.org/abs/2104.08821)

Pre-Training Methods
- **[MLM](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/MLM):** An example script to run Masked-Language-Modeling (MLM). Running MLM on your custom data before supervised training can significantly improve the performances. Further, MLM also works well for domain trainsfer: You first train on your custom data, and then train with e.g. NLI or STS data.


Training Examples
- **[Paraphrase Data](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/paraphrases):** In our paper [Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation](https://arxiv.org/abs/2004.09813) we have shown that training on paraphrase data is powerful. In that folder we provide collections of different paraphrase datasets and scripts to train on it.
- **[NLI with MultipleNegativeRankingLoss](https://www.sbert.net/examples/training/nli/README.html#multiplenegativesrankingloss)**: A dedicated example how to use MultipleNegativeRankingLoss for training with NLI data, which leads to a significant performance boost.




New models
- **[New NLI & STS models](https://www.sbert.net/docs/pretrained_models.html#semantic-textual-similarity):** Following the [Paraphrase Data training example](https://github.com/UKPLab/sentence-transformers/tree/master/examples/training/paraphrases) we published new models trained on NLI and NLI+STS data. Training code is available: [training_nli_v2.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/nli/training_nli_v2.py).

| Model-Name | STSb-test performance |
| --- | :---: |
| *Previous best models* | |
| nli-bert-large | 79.19 |
| stsb-roberta-large | 86.39 |
| *New v2 models* | |
| nli-mpnet-base-v2 | 86.53 |
| stsb-mpnet-base-v2 | 88.57 |

- **[New MS MARCO model for Semantic Search](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html)**: [Hofstätter et al.](https://arxiv.org/abs/2104.06967) optimized the training procedure on the [MS MARCO dataset](https://www.sbert.net/examples/training/ms_marco/README.html). The resulting model is integrated as **msmarco-distilbert-base-tas-b** and improves the performance on the MS MARCO dataset from 33.13 to 34.43 MRR10

New Functions
- `SentenceTransformer.fit()` **Checkpoints**: The fit() method now allows to save checkpoints during the training at a fixed number of steps. [More info](https://www.sbert.net/docs/package_reference/SentenceTransformer.html#sentence_transformers.SentenceTransformer.fit)
- **Pooling-mode as string**: You can now pass the pooling-mode to `models.Pooling()` as string:
python
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode='mean')

Valid values are mean/max/cls.
- **[NoDuplicatesDataLoader](https://www.sbert.net/docs/package_reference/datasets.html#noduplicatesdataloader)**: When using the [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss), one should avoid to have duplicate sentences in the same sentence. This data loader simplifies this task and ensures that no duplicate entries are in the same batch.~~~~

1.1.0

Not secure
Unsupervised Sentence Embedding Learning
This release integrates methods that allows to learn sentence embeddings without having labeled data:
- **[TSDAE](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/TSDAE)**: TSDAE is using a denoising auto-encoder to learn sentence embeddings. The method has been presented in our [recent paper](https://arxiv.org/abs/2104.06979) and achieves state-of-the-art performance for several tasks.
- **[GenQ](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation)**: GenQ uses a pre-trained T5 system to generate queries for a given passage. It was presented in our [recent BEIR paper](https://arxiv.org/abs/2104.08663) and works well for domain adaptation for (semantic search)[https://www.sbert.net/examples/applications/semantic-search/README.html]

New Models - SentenceTransformer
- [MSMARCO Dot-Product Models](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html): We trained models using the dot-product instead of cosine similarity as similarity function. As shown in our [recent BEIR paper](https://arxiv.org/abs/2104.08663), models with cosine-similarity prefer the retrieval of short documents, while models with dot-product prefer retrieval of longer documents. Now you can choose what is most suitable for your task.
- [MSMARCO MiniLM Models](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html): We uploaded some models based on [MiniLM](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased): It uses just 384 dimensions, is faster than previous models and achieves nearly the same performance

New Models - CrossEncoder
- [MSMARCO Re-ranking-Models v2](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html): We trained new significantly faster and significantly better CrossEncoder re-ranking models on the MSMARCO dataset. It outperforms BERT-large models in terms of accuracy while being 18 times faster. [Trainingcode is available](https://www.sbert.net/examples/training/ms_marco/README.html#cross-encoder)

New Features
- You can now pass to the CrossEncoder class a `default_activation_function`, that is applied on-top of the output logits generated by the class.
- You can now pre-process images for the [CLIP Model](https://www.sbert.net/examples/applications/image-search/README.html). Soon I will release a tutorial how to fine-tune the CLIP Model with your data.

1.0.4

Not secure
It was not possible to fine-tune and save the CLIPModel. This release fixes it. CLIPModel can now be saved like any other model by calling `model.save(path)`

1.0.3

Not secure

Page 14 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.