Beir

Latest version: v1.0.1

Safety actively analyzes 625095 Python packages for vulnerabilities to keep your Python projects secure.

12.08.2021

I had fun speaking about BEIR and Neural Search in a recent OpenNLP event on benchmarking search using BEIR.
If you are interested, the talk was recorded and is available below:

YouTube: https://www.youtube.com/watch?v=e9nNr4ugNAo
Slides: https://drive.google.com/file/d/1gghRVv6nWWmMZRqkYvuCO0HWOTEVnPNz/view?usp=sharing

3. Added Splits for each dataset in the datasets table present in README
I plan to add the new big msmarco-v2 version of the passage collection soon, this contains 138,364,198 passages (13.5 GB). The dataset contains two dev splits (``dev1``,``dev2``). Adding splits would be useful to incorporate different splits that don't follow the traditional convention of a single train, dev and test splits.

1.0.0

This is a major release since the last version v0.2.3.

1. New BEIR Organization and moving forward will be part of a collaboration
The BEIR benchmark has been shifted from [UKPLab](https://github.com/UKPLab/beir) to [beir-cellar](https://github.com/beir-cellar/beir). Moving forward, the BEIR benchmark will be actively maintained and developed with the help of UKPLab, castorini, and huggingface.

2. ColBERT model evaluation on the BEIR benchmark code released
The ColBERT model evaluation on the BEIR benchmark has been released. This code repository uses the original ColBERT repository for evaluation and training with a few tweaks.

Here is the repository for more details: https://github.com/NThakur20/beir-ColBERT

3. New Passage Expansion Model added: TILDE
Since DocT5query is compute-intensive and time-consuming to generate, we added a faster passage expansion model: TILDE (https://arxiv.org/abs/2108.08513) for expanding documents, by expanding on relevant keywords present within the BERT vocabulary. An easy example using TILDE can be shown here: [passage_expansion_tilde.py](https://github.com/beir-cellar/beir/blob/main/examples/generation/passage_expansion_tilde.py)

4. Upcoming New work for Easy evaluation of Neural Sparse Retrieval Models
We are currently developing a new repository for easy evaluation of neural sparse models including a inverted index implementation. This will help a unified evaluation of all diverse neural sparse retrieval models such as uniCOIL, SPLADE, SPARTA and DeepImpact.

An initial repository for this work and more details can be found here: https://github.com/NThakur20/sparse-retrieval.

5. Fixed breaking changes and reproducibility in Elasticsearch
58 showed issues in ES lexical search reproducibility and downloading Elasticsearch client.

1. Added a sleep_for parameter in the ES code with a default value of 2 seconds. This will forcefully sleep the ES index after index deletion, and indexing documents.
2. During bulk indexing (https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html), there is a parameter refresh which I have set to wait_for instead of default kept at false. For more details, refer here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html.
3. Freeze ES version in beir: ``elasticsearch==7.9.1`` which will help us avoid the latest issues occurring with ES policies.

6. Temporary Packages: Tensorflow
Tensorflow installation was causing issues while ``pip installing beir``. Only USE models were evaluated using TF, however, they are currently not the most popular choice in models. Hence, we decided to move forward with ``['tensorflow>=2.2.0', 'tensorflow-text', 'tensorflow-hub']`` made available as optional packages which can be installed separately in case a user wishes to evaluate the USE model or use TF for their own use-case: ``pip install beir[tf]``.

7. Fixed breaking changes in sparse search in SparseRetrieval
As notified in 62, we have updated our bug found in the sparse retrieval code for evaluating SPARTA on the beir benchmark,

0.2.3

This is a small release update!

1. BEIR Benchmark paper accepted at NeurIPS 2021 (Datasets and Benchmark Track)
I'm quite thrilled to share that BEIR has been accepted at NeurIPS 2021 conference. All reviewers had positive reviews and realized the benchmark to be useful for the community. More information can be found here: https://openreview.net/forum?id=wCu6T5xFjeJ.

2. New Multilingual datasets added within BEIR
New multilingual datasets have been added to the BEIR Benchmark. Now BEIR supports over 10+ languages. We included the translated MSMARCO dataset in 8 languages: mMARCO (https://github.com/unicamp-dl/mMARCO) and Mr.TyDi which contains train, development, and test data across 10 languages (https://github.com/castorini/mr.tydi). We hope to provide good and robust dense multilingual retrievers in the future.

3. Breaking change in Top-k accuracy now fixed
The top-k accuracy metric was by mistake sorting retrieved keys instead of retriever model scores which would have led to incorrect scores. This mistake has been identified in 45 and successfully updated and merged now.

4. Yannic Kilcher recognized BEIR as a helpful ML library
Yannic Kilcher recently mentioned the BEIR repository as a helpful library for benchmarking and evaluating diverse IR models and architectures. You can find more details in his latest ML News video on YouTube: https://www.youtube.com/watch?v=K3cmxn5znyU&t=1290s&ab_channel=YannicKilcher

0.2.2

This is a small release update! We made the following changes in the release of the beir package:

1. Now train dense retrievers (SBERT) models using Margin-MSE loss
We have added a new loss, Margin-MSE, which learns a pairwise score using hard negatives and positives. Thanks to kwang2049 for the motivation, we have now added the loss to the beir repo. The loss is most effective with a Knowledge distillation setup using a powerful teacher model. For more details, we would suggest you refer to the paper by Hofstätter et al., https://arxiv.org/abs/2010.02666.

Margin-MSE Loss function: https://github.com/UKPLab/beir/blob/main/beir/losses/margin_mse_loss.py
Train (SOTA) SBERT model using Margin-MSE: https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3_margin_MSE.py

0.2.1

1. New script to utilize docT5query in parallel with multiple GPUs!
- Thanks to joshdevins, we have a new script to utilize multiple GPUs in parallel to generate multiple queries for passages using a question generation model faster. Check it out [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/sparse/evaluate_anserini_docT5query_parallel.py)].
- You can now pass your custom GPU device if CUDA recognizable devices are not present for question generation.

2. PQ Hashing with OPQ Rotation and Scalar Quantizer from Faiss!
- Now you can utilize ``OPQ`` rotation before using PQ hashing and Scalar Quantizer for ``fp16`` faiss search instead of original ``fp32``.

3. Top-k Accuracy Metric which is commonly used in the DPR repository by facebook!
- [DPR](https://github.com/facebookresearch/DPR) repository evaluates retrieval models using the ``top-k retriever accuracy``. This would allow evaluating top-k accuracy using the BEIR repository!
python
top_k_accuracy = retriever.evaluate_custom(qrels, results, retriever.k_values, metric="top_k_accuracy")

4. Sorting of corpus documents by text length before encoding using a dense retriever!
- We now sort the corpus documents by longest size first, This has two advantages:
1. Why Sort? Similar lengths of texts are now encoded within a single batch, this would help speed up the corpus encoding process.
2. Why Sort longest to smallest? max GPU memory required can be found out in the beginning, so if OOM occurs it will occur in the beginning.

5. FEVER dataset training qrels, problems with doc-ids with special characters now fixed!
- There were issues with training qrels in the FEVER dataset. The doc-ids with special characters, for eg. ``Zlatan_Ibrahimović`` or ``Beyoncé`` had the wrong special characters present in ``qrels/train.tsv``. These were manually fixed by myself and now there are no more similar issues present in the corpus.
- New md5hash for the ``fever.zip`` dataset: ``5a818580227bfb4b35bb6fa46d9b6c03``.

0.2.0

FAISS Indexes and Search Integration

- FAISS Indexes can be created and used for evaluation using the BEIR repository. We have added support to ``Flat-IP``, ``HNSW``, ``PQ``, ``PCAMatrix``, and ``BinaryFlat`` Indexes.
- Faiss indexes use various compression algorithms useful for reducing Index memory sizes or improving retrieval speed.
- You can also save your corpus embeddings as a faiss index, which wasn't possible with the exact search originally.
- Check out how to evaluate dense retrieval using a faiss index [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_faiss_dense.py)] and dimension reduction using PCA [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_dim_reduction.py)].

Multilingual Datasets and Evaluation
- Thanks to julian-risch, we have added our first multilingual dataset to the BEIR repository - GermanQuAD (German SQuAD dataset).
- We have changed Elasticsearch now to allow evaluation on languages apart from English, check it out [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_multilingual_bm25.py)].
- We also have added a DPR model class which lets you load DPR models from Huggingface Repo, you can use this Class now for evaluation let's say the GermanDPR model [[link](https://huggingface.co/deepset/gbert-base-germandpr-ctx_encoder)].

DeepCT evaluation
- We have transformed the original DeepCT code to be able to use tensorflow (tf) >v2.0 and now hosted the latest repo [[here](https://github.com/NThakur20/DeepCT)].
- Using the hosted code, we are now able to use DeepCT for evaluation in BEIR using Anserini Retrieval, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/sparse/evaluate_deepct.py)].

Training Latest MSMARCO v3 Models
- From the [SentenceTransformers](https://github.com/UKPLab/sentence-transformers) repository, we have integrated the latest training code for MSMARCO on custom manually provided hard negatives. This provides the state-of-the-art SBERT models trained on MSMARCO, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3.py)].

Using Multiple-GPU for question-generation
- A big challenge was to use multiple GPUs for the generation of questions much faster. We have included Process-pools to generate questions much faster and now using multiple GPUs also in parallel, check [[here](https://github.com/UKPLab/beir/blob/main/examples/generation/query_gen_multi_gpu.py)].

Integration of Binary Passage Retrievers (BPR)
- BPR (ACL'21, [link](https://arxiv.org/abs/2106.00882)) is now integrated within the BEIR benchmark. Now you can easily train a state-of-the-art BPR model on MSMARCO using the loss function described in the original paper, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3_bpr.py)].
- You can also evaluate BPR now easily now in a zero-shot evaluation fashion, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_bpr.py)].
- We would soon open-source the BPR public models trained on MSMARCO.

Releases

Has known vulnerabilities