Sentence-transformers

Latest version: v4.0.1

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 22 of 24

0.3.6

Not secure

Hugginface Transformers version 3.1.0 had a breaking change with previous version 3.0.2

This release fixes the issue so that Sentence-Transformers is compatible with Huggingface Transformers 3.1.0. Note, that this and future version will not be compatible with transformers < 3.1.0.

0.3.5

Not secure

- The old FP16 training code in model.fit() was replaced by using Pytorch 1.6.0 automatic mixed precision (AMP). When setting `model.fit(use_amp=True)`, AMP will be used. On suitable GPUs, this leads to a significant speed-up while requiring less memory.
- Performance improvements in paraphrase mining & semantic search by replacing np.argpartition with torch.topk
- If a sentence-transformer model is not found, it will fall back to huggingface transformers repository and create it with mean pooling.
- Fixing huggingface transformers to version 3.0.2. Next release will make it compatible with huggingface transformers 3.1.0
- Several bugfixes: Downloading of files, mutli-GPU-encoding

0.3.4

Not secure

- The documentation is substantially improved and can be found at: [www.SBERT.net](https://www.sbert.net) - Feedback welcome
- The dataset to hold training InputExamples (dataset.SentencesDataset) now uses lazy tokenization, i.e., examples are tokenized once they are needed for a batch. If you set `num_workers` to a positive integer in your `DataLoader`, tokenization will happen in a background thread. This substantially increases the start-up time for training.
- `model.encode()` uses also a PyTorch DataSet + DataLoader. If you set `num_workers` to a positive integer, tokenization will happen in the background leading to faster encoding speed for large corpora.
- Added functions and an example for [mutli-GPU encoding](https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/computing_embeddings_mutli_gpu.py) - This method can be used to encode a corpus with multiple GPUs in parallel. No multi-GPU support for training yet.
- Removed parallel_tokenization parameters from encode & SentencesDatasets - No longer needed with lazy tokenization and DataLoader worker threads.
- Smaller bugfixes

Breaking changes:
- Renamed evaluation.BinaryEmbeddingSimilarityEvaluator to evaluation.BinaryClassificationEvaluator

0.3.3

Not secure

New Functions
- Multi-process tokenization (Linux only) for the model encode function. Significant speed-up when encoding large sets
- Tokenization of datasets for training can now run in parallel (Linux Only)
- New example for Quora Duplicate Questions Retrieval: See examples-folder
- Many small improvements for training better models for Information Retrieval
- Fixed LabelSampler (can be used to get batches with certain number of matching labels. Used for BatchHardTripletLoss). Moved it to DatasetFolder
- Added new Evaluators for ParaphraseMining and InformationRetrieval
- evaluation.BinaryEmbeddingSimilarityEvaluator no longer assumes a 50-50 split of the dataset. It computes the optimal threshold and measure accuracy
- model.encode - When the convert_to_numpy parameter is set, the method returns a numpy matrix instead of a list of numpy vectors
- New function: util.paraphrase_mining to perform paraphrase mining in a corpus. For an example see examples/training_quora_duplicate_questions/
- New function: util.information_retrieval to perform information retrieval / semantic search in a corpus. For an example see examples/training_quora_duplicate_questions/

Breaking Changes
- The evaluators (like EmbeddingSimilarityEvaluator) no longer accept a DataLoader as argument. Instead, the sentence and scores are directly passed. Old code that uses the previous evaluators needs to be changed. They can use the class method from_input_examples(). See examples/training_transformers/training_nli.py how to use the new evaluators.

0.3.2

Not secure

This is a minor release. There should be no breaking changes.

- **ParallelSentencesDataset**: Datasets are tokenized on-the-fly, saving some start-up time
- **util.pytorch_cos_sim** - Method. New method to compute cosine similarity with pytorch. About 100 times faster than scipy cdist. semantic_search.py example has been updated accordingly.
- **SentenceTransformer.encode**: New parameter: *convert_to_tensor*. If set to true, encode returns one large pytorch tensor with your embeddings

0.3.1

Not secure

This is a minor update that changes some classes for training & evaluating multilingual sentence embedding methods.

The examples for training multi-lingual sentence embeddings models have been significantly extended. See [docs/training/multilingual-models.md](https://github.com/UKPLab/sentence-transformers/blob/master/docs/training/multilingual-models.md) for details. An automatic script that downloads suitable data and extends sentence embeddings to multiple languages has been added.

The following classes/files have been changed:
- datasets/ParallelSentencesDataset.py: The dataset with parallel sentences is encoded on-the-fly, reducing the start-up time for extending a sentence embedding model to new languages. An embedding cache can be configure to store previously computed sentence embeddings during training.

New evaluation files:
- evaluation/MSEEvaluator.py - **breaking change**. Now, this class expects lists of strings with parallel (translated) sentences. The old class has been renamed to MSEEvaluatorFromDataLoader.py
- evaluation/EmbeddingSimilarityEvaluatorFromList.py - Semantic Textual Similarity data can be passed as lists of strings & scores
- evaluation/MSEEvaluatorFromDataFrame.py - MSE Evaluation of teacher and student embeddings based on data in a data frame
- evaluation/MSEEvaluatorFromDataLoader.py - MSE Evaluation if data is passed as a data loader

**Bugfixes:**
- model.encode() failed to sort sentences by length. This function has been fixed to boost encoding speed by reducing overhead of padding tokens.

Page 22 of 24

Releases

Has known vulnerabilities

Previous Next

Sentence-transformers

Page 22 of 24

0.3.6

0.3.5

0.3.4

0.3.3

0.3.2

0.3.1

Page 22 of 24

Links

Releases