pip install sentence-transformers[onnx-gpu]==4.0.1
pip install sentence-transformers[onnx]==4.0.1
pip install sentence-transformers[openvino]==4.0.1
> [!TIP]
> My [Training and Finetuning Reranker Models with Sentence Transformers v4](https://huggingface.co/blog/train-reranker) blogpost is an excellent place to learn 1) why finetuning rerankers makes sense and 2) how you can do it, too!
Reranker (Cross Encoder) training refactor (3222)
The v4.0 release centers around this huge modernization of the training approach for `CrossEncoder` models, following v3.0 which introduced the same for `SentenceTransformer` models. Whereas training before v4.0 used to be all about `InputExample`, `DataLoader` and `model.fit`, the new training approach relies on 5 components. You can learn more about these components in our [Training and Finetuning Embedding Models with Sentence Transformers v4](https://huggingface.co/blog/train-reranker) blogpost. Additionally, you can read the new [Training Overview](https://sbert.net/docs/cross_encoder/training_overview.html), check out the [Training Examples](https://sbert.net/docs/cross_encoder/training/examples.html), or read this summary:
1. [Dataset](https://sbert.net/docs/sentence_transformer/training_overview.html#dataset)
A training [`Dataset`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.Dataset) or [`DatasetDict`](https://huggingface.co/docs/datasets/main/en/package_reference/main_classes#datasets.DatasetDict). This class is much more suited for sharing & efficient modifications than lists/DataLoaders of `InputExample` instances. A `Dataset` can contain multiple text columns that will be fed in order to the corresponding loss function. So, if the loss expects (anchor, positive, negative) triplets, then your dataset should also have 3 columns. The names of these columns are irrelevant. If there is a "label" or "score" column, it is treated separately, and used as the labels during training.
A `DatasetDict` can be used to train with multiple datasets at once, e.g.:
python
DatasetDict({
natural_questions: Dataset({
features: ['anchor', 'positive'],
num_rows: 392702
})
gooaq: Dataset({
features: ['anchor', 'positive', 'negative'],
num_rows: 549367
})
stsb: Dataset({
features: ['sentence1', 'sentence2', 'label'],
num_rows: 5749
})
})
When a `DatasetDict` is used, the `loss` parameter to the `CrossEncoderTrainer` must also be a dictionary with these dataset keys, e.g.:
python
{
'natural_questions': CachedMultipleNegativesRankingLoss(...),
'gooaq': CachedMultipleNegativesRankingLoss(...),
'stsb': BinaryCrossEntropyLoss(...),
}
2. [Loss Function](https://sbert.net/docs/cross_encoder/training_overview.html#loss-function)
A loss function, or a dictionary of loss functions like described above.
3. [Training Arguments](https://sbert.net/docs/cross_encoder/training_overview.html#training-arguments)
A CrossEncoderTrainingArguments instance, subclass of a [TrainingArguments](https://huggingface.co/docs/transformers/main/en/main_classes/trainer#transformers.TrainingArguments) instance. This powerful class controls the specific details of the training.
4. [Evaluator](https://sbert.net/docs/cross_encoder/training_overview.html#evaluator)
An optional [`SentenceEvaluator`](https://sbert.net/docs/package_reference/evaluation.html) instance. Unlike before, models can now be evaluated both on an evaluation dataset with some loss function and/or a `SentenceEvaluator` instance.
5. [Trainer](https://sbert.net/docs/sentence_transformer/training_overview.html#trainer)
The new `CrossEncoderTrainer` instance based on the `transformers` `Trainer`. This instance can be initialized with a CrossEncoder model, a CrossEncoderTrainingArguments class, a SentenceEvaluator, a training and evaluation Dataset/DatasetDict and a loss function/dict of loss functions. Most of these parameters are optional. Once provided, all you have to do is call `trainer.train()`.
Some of the major features that are now implemented include:
* MultiGPU Training (Data Parallelism (DP) and Distributed Data Parallelism (DDP))
* bf16 training support
* Loss logging
* Evaluation datasets + evaluation loss
* Improved callback support (built-in via Weights and Biases, TensorBoard, CodeCarbon, etc., as well as custom callbacks)
* Gradient checkpointing
* Gradient accumulation
* Improved model card generation
* Warmup ratio
* Pushing to the Hugging Face Hub on every model checkpoint
* Resuming from a training checkpoint
* Hyperparameter Optimization
This script is a minimal example (no evaluator, no training arguments) of training [`mpnet-base`](https://huggingface.co/microsoft/mpnet-base) on a part of the [`sentence-transformers/hotpotqa` dataset](https://huggingface.co/datasets/sentence-transformers/hotpotqa) using [`BinaryCrossEntropyLoss`](https://sbert.net/docs/package_reference/cross_encoder/losses.html#binarycrossentropyloss):
python
from datasets import load_dataset
from sentence_transformers import CrossEncoder, CrossEncoderTrainer
from sentence_transformers.cross_encoder.losses import BinaryCrossEntropyLoss
1. Define the model. Either from scratch of by loading a pre-trained model
model = CrossEncoder("microsoft/mpnet-base")
2. Load a dataset to finetune on
dataset = load_dataset("sentence-transformers/hotpotqa", "triplet", split="train")
def triplet_to_labeled_pair(batch):
anchors = batch["anchor"]
positives = batch["positive"]
negatives = batch["negative"]
return {
"sentence_A": anchors * 2,
"sentence_B": positives + negatives,
"labels": [1] * len(positives) + [0] * len(negatives),
}
dataset = dataset.map(triplet_to_labeled_pair, batched=True, remove_columns=dataset.column_names)
train_dataset = dataset.select(range(10_000))
eval_dataset = dataset.select(range(10_000, 11_000))
3. Define a loss function
loss = BinaryCrossEntropyLoss(model)
4. Create a trainer & train
trainer = CrossEncoderTrainer(
model=model,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
loss=loss,
)
trainer.train()
5. Save the trained model
model.save_pretrained("models/mpnet-base-hotpotqa")
model.push_to_hub("mpnet-base-hotpotqa")
Additionally, trained models now automatically produce extensive model cards. Each of the following models were trained using some script from the [Training Examples](https://sbert.net/docs/cross_encoder/training/examples.html), and the model cards were not edited manually whatsoever:
* [tomaarsen/reranker-MiniLM-L12-gooaq-bce](https://huggingface.co/tomaarsen/reranker-MiniLM-L12-gooaq-bce)
* [tomaarsen/reranker-msmarco-MiniLM-L12-H384-uncased-lambdaloss](https://huggingface.co/tomaarsen/reranker-msmarco-MiniLM-L12-H384-uncased-lambdaloss)
* [tomaarsen/reranker-distilroberta-base-nli](https://huggingface.co/tomaarsen/reranker-distilroberta-base-nli)
Prior to the Sentence Transformer v4 release, all reranker models would be trained using the [`CrossEncoder.fit`](https://sbert.net/docs/package_reference/cross_encoder/cross_encoder.html#sentence_transformers.cross_encoder.CrossEncoder.fit) method. Rather than deprecating this method, starting from v4.0, this method will use the [`CrossEncoderTrainer`](https://sbert.net/docs/package_reference/cross_encoder/trainer.html#sentence_transformers.cross_encoder.trainer.CrossEncoderTrainer) behind the scenes. This means that your old training code should still work, and should even be upgraded with the new features such as multi-gpu training, loss logging, etc. That said, the new training approach is much more powerful, so it is **recommended** to write new training scripts using the new approach.
To help you out, all of the Cross Encoder (a.k.a. reranker) training scripts were updated to use the new Trainer-based approach.
Is finetuning worth it?
Finetuning reranker models on your data is very valuable. Consider for example these 2 models that I finetuned on 100k samples from the GooAQ dataset in 30 minutes and 1 hour, respectively. After finetuning, my models heavily outperformed general-purpose reranker models, even though GooAQ is a very generic dataset/domain!
* [tomaarsen/reranker-ModernBERT-base-gooaq-bce](https://huggingface.co/tomaarsen/reranker-ModernBERT-base-gooaq-bce)
* [tomaarsen/reranker-ModernBERT-large-gooaq-bce](https://huggingface.co/tomaarsen/reranker-ModernBERT-large-gooaq-bce)

Read my [Training and Finetuning Reranker Models with Sentence Transformers v4](https://huggingface.co/blog/train-reranker) blogpost for many more details on these models and how they were trained.
Resources:
* How to **use** Cross Encoder models? [Cross Encoder > Usage](https://sbert.net/docs/cross_encoder/usage/usage.html)
* What Cross Encoder **models** can I use? [Cross Encoder > Pretrained Models](https://sbert.net/docs/cross_encoder/pretrained_models.html)
* How do I **train/finetune** a Cross Encoder model? [Cross Encoder > Training Overview](https://sbert.net/docs/cross_encoder/training_overview.html)
Refactor Stats
* Code:
* New Trainer, Training Arguments, Data Collator, Model Card generation + template, with backwards compatibility
* [11 new losses](https://sbert.net/docs/package_reference/cross_encoder/losses.html)
* [1 new, 3 refactored, 6 deprecated evaluators](https://sbert.net/docs/package_reference/cross_encoder/evaluation.html)
* Tests:
* [84 tests for CrossEncoder Loading, inference, training, etc.](https://github.com/UKPLab/sentence-transformers/tree/master/tests/cross_encoder)
* Docs:
* All new [Training Overview](https://sbert.net/docs/cross_encoder/training_overview.html), [Loss Overview](https://sbert.net/docs/cross_encoder/loss_overview.html), [API Reference](https://sbert.net/docs/package_reference/cross_encoder/index.html) docs
* [5 new, 1 refactored training examples docs pages](https://sbert.net/docs/cross_encoder/training/examples.html)
* [13 new, 6 refactored training scripts](https://github.com/UKPLab/sentence-transformers/tree/master/examples/cross_encoder/training)
* Migration guide ([2.x -> 3.x](https://sbert.net/docs/migration_guide.html#migrating-from-v2-x-to-v3-x), [3.x -> 4.x](https://sbert.net/docs/migration_guide.html#migrating-from-v3-x-to-v4-x))
Small Features
* Introduce `show_progress_bar` for the `InformationRetrievalEvaluator` (3227)
* Replace `SubsetRandomSampler` with `RandomSampler` in the default batch sampler, should result in reduced memory usage and increased training speed! (3261)
* Allow resuming from checkpoint when training with the deprecated `SentenceTransformer.fit` (3269)
* Allow truncation and setting `model.max_seq_length` for CLIP models (2969)
Bug Fixes
* Fixed `MatryoshkaLoss` with `n_dims_per_step` and an unsorted `matryoshka_dims` crashing (3203)
* Fixed `GISTEmbedLoss` failing with some base models whose tokenizers don't have the `vocab` attribute (3219, 3226)
* Fixed support of `Asym`-based `SentenceTransformer` models (3220, 3244)
* Fixed some evaluator outputs not being converted to a Python float, i.e. staying as `numpy` or `torch` (3277)
Examples
* Improved TSDAE examples (3263, 3265)
Note
The `v4.0.0` version did not include the `model_card_template.md` in the package, this has been resolved in `v4.0.1` via ba1260d58c8804e97989e5af06ac90a0f4be8594.
What's Changed
* fix MatryoshkaLoss bug: sort sampled dimension indices to maintain descending dimension order by emapco in https://github.com/UKPLab/sentence-transformers/pull/3203
* [`docs`] Resolve broken URL due to weird & behaviour in pretrained ST models by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3213
* Update Evaluation Script for Reranking by milistu in https://github.com/UKPLab/sentence-transformers/pull/3198
* [`docs`] Update incorrect name: pairwise_similarity -> similarity_pairwise by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3224
* [`fix`] Use .get_vocab() instead of .vocab for checking tokenizer vocabulary by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3226
* [`feat`] Add progress bar support for corpus in IR Evaluator by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3227
* Update CoSENTLoss.py documentation by johneckberg in https://github.com/UKPLab/sentence-transformers/pull/3230
* NoDuplicatesDataLoader Compatability with Asymmetric models by OsamaS99 in https://github.com/UKPLab/sentence-transformers/pull/3220
* [`fix`] Fix Syntax issue; move 'as fIn' to after the if-else in `STSDataReader` by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3235
* Model Card Compatability & BinaryClassificationEvaluator with Asymmetric Models by OsamaS99 in https://github.com/UKPLab/sentence-transformers/pull/3244
* [fix] Changed value error for missing model into FileNotFoundError by PhorstenkampFuzzy in https://github.com/UKPLab/sentence-transformers/pull/3238
* Add check for hpu and wrap_in_hpu_graph availability. by vshekhawat-hlab in https://github.com/UKPLab/sentence-transformers/pull/3249
* Replacing SubsetRandomSampler by RandomSampler in BATCH_SAMPLER by NohTow in https://github.com/UKPLab/sentence-transformers/pull/3261
* Fix: Reorder dataset columns for DenoisingAutoEncoderLoss in TSADE examples by HuangBugWei in https://github.com/UKPLab/sentence-transformers/pull/3263
* Update to fit_mixin.fit to allow fine tuning to resume from a checkpoint by NRamirez01 in https://github.com/UKPLab/sentence-transformers/pull/3269
* Fix: dynamic noise addition during training in TSADE examples by HuangBugWei in https://github.com/UKPLab/sentence-transformers/pull/3265
* [`typing`] Fix the type hints in CGISTEmbedLoss by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3272
* typing: fix typing on encode by stephantul in https://github.com/UKPLab/sentence-transformers/pull/3270
* feat: add 'Path' parameter for ModelCard template by sam-hey in https://github.com/UKPLab/sentence-transformers/pull/3253
* Always convert the evaluation metrics to float, also without a 'name' by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3277
* Add truncation to CLIP model by MrLoh in https://github.com/UKPLab/sentence-transformers/pull/2969
* [`v4`] CrossEncoder Training refactor - MultiGPU, loss logging, bf16, etc. by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3222
* Bump jinja2 from 3.1.5 to 3.1.6 in /docs by dependabot in https://github.com/UKPLab/sentence-transformers/pull/3282
* Update the core README in preparation for the v4.0 release by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3283
* Make minor updates to docs by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3285
* Add the .htaccess to git, automatically include it in builds by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3286
* Update main description by tomaarsen in https://github.com/UKPLab/sentence-transformers/pull/3287
New Contributors
* emapco made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3203
* OsamaS99 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3220
* PhorstenkampFuzzy made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3238
* vshekhawat-hlab made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3249
* NohTow made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3261
* HuangBugWei made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3263
* NRamirez01 made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3269
* stephantul made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3270
* sam-hey made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/3253
* MrLoh made their first contribution in https://github.com/UKPLab/sentence-transformers/pull/2969
A special shoutout to milistu for contributing the LambdaLoss & ListNetLoss and yjoonjang for contributing the ListMLELoss, PListMLELoss, and RankNetLoss. Much appreciated, you really helped improve this release!
**Full Changelog**: https://github.com/UKPLab/sentence-transformers/compare/v3.4.1...v4.0.1