Sentence-transformers

Latest version: v4.0.1

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 13 of 24

2.3.1

Not secure

This releases patches a niche bug when loading a Sentence Transformer model which:
1. is local
2. uses a `Normalize` module as specified in `modules.json`
3. does not contain the directory specified in the model configuration

This only occurs when a model with `Normalize` is downloaded from the Hugging Face hub and then later used locally.
See 2458 and 2459 for more details.

Release highlights
* Don't require loading files for Normalize by tomaarsen (2460)

**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v2.3.0...v2.3.1

2.3.0

Not secure

Changes:
- ALBERT works (bug was fixed in transformers). Does not yield improvements compared to BERT / RoBERTA
- T5 added (does not run on GPU due to a bug in transformers). Does not yield improvements compared to BERT / RoBERTA
- CamemBERT added
- XML-RoBERTa added

2.2.2

Not secure

huggingface_hub` dropped support in version 0.5.0 for Python 3.6

This release fixes the issue so that `huggingface_hub` with version 0.4.0 and Python 3.6 can still be used.

2.2.1

Not secure

Version `0.8.1` of `huggingface_hub` introduces several changes that resulted in errors and warnings. This version of `sentence-transformers` fixes these issues.

Further, several improvements have been added / merged:
- `util.community_detection` was improved: 1) It works in a batched mode to save memory, 2) Overlapping clusters are no longer dropped but removed by overlapping items, 3) The parameter `init_max_size` was removed and replaced by a heuristic to estimate the max size of clusters
- 1581 the training dataset names can be saved in the model card
- 1426 fix the text summarization example
- 1487 Rekursive sentence-transformers models are now possible
- 1522 Private models can now be loaded
- 1551 DataLoaders can now have workers
- 1565 Models are just checked on the hub if they don't exist in the cache. Fixes issues with connectivity issues
- 1591 Example added how to stream encode larger datasets

2.2.0

Not secure

T5
You can now use the encoder from T5 to learn text embeddings. You can use it like any other transformer model:
python
from sentence_transformers import SentenceTransformer, models
word_embedding_model = models.Transformer('t5-base', max_seq_length=256)
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

See [T5-Benchmark results](https://www.sbert.net/docs/training/overview.html#best-transformer-model) - the T5 encoder is not the best model for learning text embeddings models. It requires quite a lot of training data and training steps. Other models perform much better, at least in the given experiment with 560k training triplets.

New Models
The models from the papers [Sentence-T5: Scalable sentence encoders from pre-trained text-to-text models](https://arxiv.org/abs/2108.08877) and [Large Dual Encoders Are Generalizable Retrievers](https://arxiv.org/abs/2112.07899) have been added:
- [gtr-t5-base](https://huggingface.co/sentence-transformers/gtr-t5-base)
- [gtr-t5-large](https://huggingface.co/sentence-transformers/gtr-t5-large)
- [gtr-t5-xl](https://huggingface.co/sentence-transformers/gtr-t5-xl)
- [gtr-t5-xxl](https://huggingface.co/sentence-transformers/gtr-t5-xxl)
- [sentence-t5-base](https://huggingface.co/sentence-transformers/sentence-t5-base)
- [sentence-t5-large](https://huggingface.co/sentence-transformers/sentence-t5-large)
- [sentence-t5-xl](https://huggingface.co/sentence-transformers/sentence-t5-xl)
- [sentence-t5-xxl](https://huggingface.co/sentence-transformers/sentence-t5-xxl)

For benchmark results, see [https://seb.sbert.net](https://seb.sbert.net?model_name=sentence-transformers/gtr-t5-base)

Private Models

Thanks to 1406 you can now load private models from the hub:
python
model = SentenceTransformer("your-username/your-model", use_auth_token=True)

2.1.0

Not secure

This is a smaller release with some new features

MarginMSELoss
[MarginMSELoss](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/losses/MarginMSELoss.py) is a great method to train embeddings model with the help of a cross-encoder model. The details are explained here: [MSMARCO - MarginMSE Training](https://www.sbert.net/examples/training/ms_marco/README.html#marginmse)

You pass your training data in the format:
python
InputExample(texts=[query, positive, negative], label=cross_encoder.predict([query, positive])-cross_encoder.predict([query, negative])

MultipleNegativesSymmetricRankingLoss
MultipleNegativesRankingLoss computes the loss just in one way: Find the correct answer for a given question.

[MultipleNegativesSymmetricRankingLoss](https://github.com/UKPLab/sentence-transformers/blob/master/sentence_transformers/losses/MultipleNegativesSymmetricRankingLoss.py) also computes the loss in the other direction: Find the correct question for a given answer.

Breaking Change: CLIPModel
The CLIPModel is now based on the `transformers` model.

You can still load it like this:
python
model = SentenceTransformer('clip-ViT-B-32')

Older SentenceTransformers versions are now longer able to load and use the 'clip-ViT-B-32' model.

Added files on the hub are automatically downloaded
PR 1116 checks if you have all files in your local cache or if there are added files on the hub. If this is the case, it will automatically download them.

`SentenceTransformers.encode()` can return all values

When you set `output_value=None` for the `encode` method, all values (token_ids, token_embeddings, sentence_embedding) will be returned.

Page 13 of 24

Releases

Has known vulnerabilities

Previous Next

Sentence-transformers

Page 13 of 24

2.3.1

2.3.0

2.2.2

2.2.1

2.2.0

2.1.0

Page 13 of 24

Links

Releases