Sentence-transformers

Latest version: v3.3.1

Safety actively analyzes 681874 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 22 of 23

0.3.2

Not secure
This is a minor release. There should be no breaking changes.

- **ParallelSentencesDataset**: Datasets are tokenized on-the-fly, saving some start-up time
- **util.pytorch_cos_sim** - Method. New method to compute cosine similarity with pytorch. About 100 times faster than scipy cdist. semantic_search.py example has been updated accordingly.
- **SentenceTransformer.encode**: New parameter: *convert_to_tensor*. If set to true, encode returns one large pytorch tensor with your embeddings

0.3.1

Not secure
This is a minor update that changes some classes for training & evaluating multilingual sentence embedding methods.

The examples for training multi-lingual sentence embeddings models have been significantly extended. See [docs/training/multilingual-models.md](https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/multilingual-models.md) for details. An automatic script that downloads suitable data and extends sentence embeddings to multiple languages has been added.

The following classes/files have been changed:
- datasets/ParallelSentencesDataset.py: The dataset with parallel sentences is encoded on-the-fly, reducing the start-up time for extending a sentence embedding model to new languages. An embedding cache can be configure to store previously computed sentence embeddings during training.

New evaluation files:
- evaluation/MSEEvaluator.py - **breaking change**. Now, this class expects lists of strings with parallel (translated) sentences. The old class has been renamed to MSEEvaluatorFromDataLoader.py
- evaluation/EmbeddingSimilarityEvaluatorFromList.py - Semantic Textual Similarity data can be passed as lists of strings & scores
- evaluation/MSEEvaluatorFromDataFrame.py - MSE Evaluation of teacher and student embeddings based on data in a data frame
- evaluation/MSEEvaluatorFromDataLoader.py - MSE Evaluation if data is passed as a data loader


**Bugfixes:**
- model.encode() failed to sort sentences by length. This function has been fixed to boost encoding speed by reducing overhead of padding tokens.

0.3.0

Not secure
This release updates HuggingFace transformers to v3.0.2. Transformers did some breaking changes to the tokenization API. This (and future) versions will not be compatible with HuggingFace transfomers v2.

There are no known breaking changes for existent models or existent code. Models trained with version 2 can be loaded without issues.

New Loss Functions
Thanks to PR 299 and 176 several new loss functions: Different triplet loss functions and ContrastiveLoss

0.2.6

Not secure
The release update huggingface/transformers to the release v2.8.0.

New Features
- **models.Transformer**: The Transformer-Model can now load any huggingface transformers model, like BERT, RoBERTa, XLNet, XLM-R, Elextra... It is based on the AutoModel from HuggingFace. You now longer need the architecture specific models (like models.BERT, models.RoBERTa) any more. It also works with the community models.
- **Multilingual Training**: Code is released for making mono-lingual sentence embeddings models mutli-lingual. See training_multilingual.py for an example. More documentation and details will follow soon.
- **WKPooling**: Adding a pytorch implementation of SBERT-WK. Note, due to an inefficient implementation in pytorch of QR decomposition, WKPooling can only be run on the CPU, which makes it about 40 slower than mean pooling. For some models WKPooling improves the performance, for other don't.
- **WeightedLayerPooling**: A new pooling layer that uses representations from all transformer layers and learns a weighted sum of them. So far no improvement compared to only averaging the last layer.
- **New pre-trained models** released. Every available model is document in a google Spreadsheet for an easier overview.

Minor changes
- Clean-up of the examples folder.
- Model and tokenizer arguments can now be passed to the according transformers models.
- Previous version had some issues with RoBERTa and XLM-RoBERTa, that the wrong special characters were added. Everything is fixed now and relies on huggingface transformers for the correct addition of special characters to the input sentences.

Breaking changes
- *STSDataReader*: The default parameter values have been changed, so that it expects the sentences in the first two columns and the score in the third column. If you want to load the STS benchmkark dataset, you can use the STSBenchmarkDataReader.

0.2.5

Not secure

0.2.4

Not secure
This version update the underlying HuggingFace Transformer package to v2.2.1.

**Changes:**
- DistilBERT and ALBERT modules added
- Pre-trained models for RoBERTa and DistilBERT uploaded
- Some smaller bug-fixes

Page 22 of 23

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.