Sentence-transformers

Latest version: v3.3.1

Safety actively analyzes 681874 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 15 of 23

1.0.2

Not secure
- Bugfix in CLIPModel: Too long inputs raised a RuntimeError. Now they are truncated.
- New util function: util.paraphrase_mining_embeddings, to find most similar embeddings in a matrix
- **Image Clustering** and **Duplicate Image Detection** examples added: [more info](https://www.sbert.net/examples/applications/image-search/README.html#examples)

1.0.0

Not secure
This release brings many new improvements and new features. Also, the version number scheme is updated. Now we use the format x.y.z with x: for major releases, y: smaller releases with new features, z: bugfixes

Text-Image-Model CLIP
You can now encode text and images in the same vector space using the OpenAI CLIP Model. You can use the model like this:
python
from sentence_transformers import SentenceTransformer, util
from PIL import Image

Load CLIP model
model = SentenceTransformer('clip-ViT-B-32')

Encode an image:
img_emb = model.encode(Image.open('two_dogs_in_snow.jpg'))

Encode text descriptions
text_emb = model.encode(['Two dogs in the snow', 'A cat on a table', 'A picture of London at night'])

Compute cosine similarities
cos_scores = util.cos_sim(img_emb, text_emb)
print(cos_scores)

[More Information](https://www.sbert.net/examples/applications/image-search/README.html)
[IPython Demo](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/image-search/Image_Search.ipynb)
[Colab Demo](https://colab.research.google.com/drive/16OdADinjAg3w3ceZy3-cOR9A-5ZW9BYr#scrollTo=xTFNbzmG3erx)

Examples how to train the CLIP model on your data will be added soon.

New Models
- Add v3 models trained for semantic search on MS MARCO: [MS MARCO Models v3](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/msmarco-v3.md)
- First models trained on Natural Questions dataset for Q&A Retrieval: [Natural Questions Models v1](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/nq-v1.md)
- Add DPR Models from Facebook for Q&A Retrieval: [DPR-Models](https://github.com/UKPLab/sentence-transformers/blob/master/docs/pretrained-models/dpr.md)

New Features
- The [Asym Model](https://github.com/UKPLab/sentence-transformers/releases/tag/v0.4.1) can now be used as the first model in a SentenceTransformer modules list.
- Sorting when encoding changes: Previously, we encoded from short to long sentences. Now we encode from long to short sentences. Out-of-memory errors will then happen at the start. Also the approximation on the duration of the encode process is more precise
- Improvement of the util.semantic_search method: It now uses the much faster torch.topk function. Further, you can define which scoring function should be used
- New util methods: `util.dot_score` computes the dot product of two embedding matrices. `util.normalize_embeddings` will normalize embeddings to unit length
- New parameter for `SentenceTransformer.encode` method: `normalize_embeddings` if set to true, it will normalize embeddings to unit length. In that case the faster `util.dot_score` can be used instead of `util.cos_sim` to compute cosine similarity scores.
- If you specify in `models.Transformer(do_lower_case=True)` when creating a new SentenceTransformer, then all input will be lower cased.


New Examples
- Add example for model quantization on CPUs (smaller models, faster run-time): [model_quantization.py](https://github.com/UKPLab/sentence-transformers/blob/master/examples/training/distillation/model_quantization.py)
- Start to add example how to train SBERT models without training data: [unsupervised learning](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning). We start with an example for [Query Generation](https://github.com/UKPLab/sentence-transformers/tree/master/examples/unsupervised_learning/query_generation) to train a semantic search model.

Bugfixes
- Encode method now correctly returns token_embeddings if `output_value='token_embeddings'` is defined
- Bugfix of the `LabelAccuracyEvaluator`
- Bugfix of removing tensors off the CPU if you specified `encode(sent, convert_to_tensor=True)`. They now stay on the GPU

Breaking changes:
- SentenceTransformer.encode-Methode: Removed depcreated parameters is_pretokenized and num_workers

0.8342190421330611

0.8312754408857808

0.8260094846238505

0.8244338431442343

All changes
* Add 'precision' support to the EmbeddingSimilarityEvaluator by tomaarsen in 2559
* [hotfix] Quantization patch; fix semantic_search_faiss/semantic_search_usearch rescoring by tomaarsen in 2558
* Fix a typo in a docstring in CosineSimilarityLoss.py by bryant1410 in 2553

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v2.6.0...v2.6.1

Page 15 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.