Sentence-transformers

Latest version: v4.0.1

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 22 of 24

0.3.6

Not secure
Hugginface Transformers version 3.1.0 had a breaking change with previous version 3.0.2

This release fixes the issue so that Sentence-Transformers is compatible with Huggingface Transformers 3.1.0. Note, that this and future version will not be compatible with transformers < 3.1.0.

0.3.5

Not secure
- The old FP16 training code in model.fit() was replaced by using Pytorch 1.6.0 automatic mixed precision (AMP). When setting `model.fit(use_amp=True)`, AMP will be used. On suitable GPUs, this leads to a significant speed-up while requiring less memory.
- Performance improvements in paraphrase mining & semantic search by replacing np.argpartition with torch.topk
- If a sentence-transformer model is not found, it will fall back to huggingface transformers repository and create it with mean pooling.
- Fixing huggingface transformers to version 3.0.2. Next release will make it compatible with huggingface transformers 3.1.0
- Several bugfixes: Downloading of files, mutli-GPU-encoding

0.3.4

Not secure
- The documentation is substantially improved and can be found at: [www.SBERT.net](https://www.sbert.net) - Feedback welcome
- The dataset to hold training InputExamples (dataset.SentencesDataset) now uses lazy tokenization, i.e., examples are tokenized once they are needed for a batch. If you set `num_workers` to a positive integer in your `DataLoader`, tokenization will happen in a background thread. This substantially increases the start-up time for training.
- `model.encode()` uses also a PyTorch DataSet + DataLoader. If you set `num_workers` to a positive integer, tokenization will happen in the background leading to faster encoding speed for large corpora.
- Added functions and an example for [mutli-GPU encoding](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing_embeddings_mutli_gpu.py) - This method can be used to encode a corpus with multiple GPUs in parallel. No multi-GPU support for training yet.
- Removed parallel_tokenization parameters from encode & SentencesDatasets - No longer needed with lazy tokenization and DataLoader worker threads.
- Smaller bugfixes

Breaking changes:
- Renamed evaluation.BinaryEmbeddingSimilarityEvaluator to evaluation.BinaryClassificationEvaluator

0.3.3

Not secure
New Functions
- Multi-process tokenization (Linux only) for the model encode function. Significant speed-up when encoding large sets
- Tokenization of datasets for training can now run in parallel (Linux Only)
- New example for Quora Duplicate Questions Retrieval: See examples-folder
- Many small improvements for training better models for Information Retrieval
- Fixed LabelSampler (can be used to get batches with certain number of matching labels. Used for BatchHardTripletLoss). Moved it to DatasetFolder
- Added new Evaluators for ParaphraseMining and InformationRetrieval
- evaluation.BinaryEmbeddingSimilarityEvaluator no longer assumes a 50-50 split of the dataset. It computes the optimal threshold and measure accuracy
- model.encode - When the convert_to_numpy parameter is set, the method returns a numpy matrix instead of a list of numpy vectors
- New function: util.paraphrase_mining to perform paraphrase mining in a corpus. For an example see examples/training_quora_duplicate_questions/
- New function: util.information_retrieval to perform information retrieval / semantic search in a corpus. For an example see examples/training_quora_duplicate_questions/

Breaking Changes
- The evaluators (like EmbeddingSimilarityEvaluator) no longer accept a DataLoader as argument. Instead, the sentence and scores are directly passed. Old code that uses the previous evaluators needs to be changed. They can use the class method from_input_examples(). See examples/training_transformers/training_nli.py how to use the new evaluators.

0.3.2

Not secure
This is a minor release. There should be no breaking changes.

- **ParallelSentencesDataset**: Datasets are tokenized on-the-fly, saving some start-up time
- **util.pytorch_cos_sim** - Method. New method to compute cosine similarity with pytorch. About 100 times faster than scipy cdist. semantic_search.py example has been updated accordingly.
- **SentenceTransformer.encode**: New parameter: *convert_to_tensor*. If set to true, encode returns one large pytorch tensor with your embeddings

0.3.1

Not secure
This is a minor update that changes some classes for training & evaluating multilingual sentence embedding methods.

The examples for training multi-lingual sentence embeddings models have been significantly extended. See [docs/training/multilingual-models.md](https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/multilingual-models.md) for details. An automatic script that downloads suitable data and extends sentence embeddings to multiple languages has been added.

The following classes/files have been changed:
- datasets/ParallelSentencesDataset.py: The dataset with parallel sentences is encoded on-the-fly, reducing the start-up time for extending a sentence embedding model to new languages. An embedding cache can be configure to store previously computed sentence embeddings during training.

New evaluation files:
- evaluation/MSEEvaluator.py - **breaking change**. Now, this class expects lists of strings with parallel (translated) sentences. The old class has been renamed to MSEEvaluatorFromDataLoader.py
- evaluation/EmbeddingSimilarityEvaluatorFromList.py - Semantic Textual Similarity data can be passed as lists of strings & scores
- evaluation/MSEEvaluatorFromDataFrame.py - MSE Evaluation of teacher and student embeddings based on data in a data frame
- evaluation/MSEEvaluatorFromDataLoader.py - MSE Evaluation if data is passed as a data loader


**Bugfixes:**
- model.encode() failed to sort sentences by length. This function has been fixed to boost encoding speed by reducing overhead of padding tokens.

Page 22 of 24

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.