Sentence-transformers

Latest version: v4.0.1

Safety actively analyzes 723683 Python packages for vulnerabilities to keep your Python projects secure.

Page 15 of 24

1.1.0

Not secure

Unsupervised Sentence Embedding Learning
This release integrates methods that allows to learn sentence embeddings without having labeled data:
- **[TSDAE](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/TSDAE)**: TSDAE is using a denoising auto-encoder to learn sentence embeddings. The method has been presented in our [recent paper](https://arxiv.org/abs/2104.06979) and achieves state-of-the-art performance for several tasks.
- **[GenQ](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation)**: GenQ uses a pre-trained T5 system to generate queries for a given passage. It was presented in our [recent BEIR paper](https://arxiv.org/abs/2104.08663) and works well for domain adaptation for (semantic search)[https://www.sbert.net/examples/applications/semantic-search/README.html]

New Models - SentenceTransformer
- [MSMARCO Dot-Product Models](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html): We trained models using the dot-product instead of cosine similarity as similarity function. As shown in our [recent BEIR paper](https://arxiv.org/abs/2104.08663), models with cosine-similarity prefer the retrieval of short documents, while models with dot-product prefer retrieval of longer documents. Now you can choose what is most suitable for your task.
- [MSMARCO MiniLM Models](https://www.sbert.net/docs/pretrained-models/msmarco-v3.html): We uploaded some models based on [MiniLM](https://huggingface.co/microsoft/MiniLM-L12-H384-uncased): It uses just 384 dimensions, is faster than previous models and achieves nearly the same performance

New Models - CrossEncoder
- [MSMARCO Re-ranking-Models v2](https://www.sbert.net/docs/pretrained-models/ce-msmarco.html): We trained new significantly faster and significantly better CrossEncoder re-ranking models on the MSMARCO dataset. It outperforms BERT-large models in terms of accuracy while being 18 times faster. [Trainingcode is available](https://www.sbert.net/examples/training/ms_marco/README.html#cross-encoder)

New Features
- You can now pass to the CrossEncoder class a `default_activation_function`, that is applied on-top of the output logits generated by the class.
- You can now pre-process images for the [CLIP Model](https://www.sbert.net/examples/applications/image-search/README.html). Soon I will release a tutorial how to fine-tune the CLIP Model with your data.

1.0.4

Not secure

It was not possible to fine-tune and save the CLIPModel. This release fixes it. CLIPModel can now be saved like any other model by calling `model.save(path)`

1.0.3

Not secure

1.0.2

Not secure

- Bugfix in CLIPModel: Too long inputs raised a RuntimeError. Now they are truncated.
- New util function: util.paraphrase_mining_embeddings, to find most similar embeddings in a matrix
- **Image Clustering** and **Duplicate Image Detection** examples added: [more info](https://www.sbert.net/examples/applications/image-search/README.html#examples)

1.0.0

Not secure

This release brings many new improvements and new features. Also, the version number scheme is updated. Now we use the format x.y.z with x: for major releases, y: smaller releases with new features, z: bugfixes

Text-Image-Model CLIP
You can now encode text and images in the same vector space using the OpenAI CLIP Model. You can use the model like this:
python
from sentence_transformers import SentenceTransformer, util
from PIL import Image

Load CLIP model
model = SentenceTransformer('clip-ViT-B-32')

Encode an image:
img_emb = model.encode(Image.open('two_dogs_in_snow.jpg'))

Encode text descriptions
text_emb = model.encode(['Two dogs in the snow', 'A cat on a table', 'A picture of London at night'])

Compute cosine similarities
cos_scores = util.cos_sim(img_emb, text_emb)
print(cos_scores)

[More Information](https://www.sbert.net/examples/applications/image-search/README.html)
[IPython Demo](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/image-search/Image_Search.ipynb)
[Colab Demo](https://colab.research.google.com/drive/16OdADinjAg3w3ceZy3-cOR9A-5ZW9BYr#scrollTo=xTFNbzmG3erx)

Examples how to train the CLIP model on your data will be added soon.

New Models
- Add v3 models trained for semantic search on MS MARCO: [MS MARCO Models v3](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/msmarco-v3.md)
- First models trained on Natural Questions dataset for Q&A Retrieval: [Natural Questions Models v1](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/nq-v1.md)
- Add DPR Models from Facebook for Q&A Retrieval: [DPR-Models](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/dpr.md)

New Features
- The [Asym Model](https://github.com/UKPLab/sentence-transformers/releases/tag/v0.4.1) can now be used as the first model in a SentenceTransformer modules list.
- Sorting when encoding changes: Previously, we encoded from short to long sentences. Now we encode from long to short sentences. Out-of-memory errors will then happen at the start. Also the approximation on the duration of the encode process is more precise
- Improvement of the util.semantic_search method: It now uses the much faster torch.topk function. Further, you can define which scoring function should be used
- New util methods: `util.dot_score` computes the dot product of two embedding matrices. `util.normalize_embeddings` will normalize embeddings to unit length
- New parameter for `SentenceTransformer.encode` method: `normalize_embeddings` if set to true, it will normalize embeddings to unit length. In that case the faster `util.dot_score` can be used instead of `util.cos_sim` to compute cosine similarity scores.
- If you specify in `models.Transformer(do_lower_case=True)` when creating a new SentenceTransformer, then all input will be lower cased.

New Examples
- Add example for model quantization on CPUs (smaller models, faster run-time): [model_quantization.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py)
- Start to add example how to train SBERT models without training data: [unsupervised learning](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning). We start with an example for [Query Generation](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation) to train a semantic search model.

Bugfixes
- Encode method now correctly returns token_embeddings if `output_value='token_embeddings'` is defined
- Bugfix of the `LabelAccuracyEvaluator`
- Bugfix of removing tensors off the CPU if you specified `encode(sent, convert_to_tensor=True)`. They now stay on the GPU

Breaking changes:
- SentenceTransformer.encode-Methode: Removed depcreated parameters is_pretokenized and num_workers

0.8342190421330611

Page 15 of 24

Releases

Has known vulnerabilities

Previous Next

Sentence-transformers

Page 15 of 24

1.1.0

1.0.4

1.0.3

1.0.2

1.0.0

0.8342190421330611

Page 15 of 24

Links

Releases