FAISS Indexes and Search Integration
- FAISS Indexes can be created and used for evaluation using the BEIR repository. We have added support to ``Flat-IP``, ``HNSW``, ``PQ``, ``PCAMatrix``, and ``BinaryFlat`` Indexes.
- Faiss indexes use various compression algorithms useful for reducing Index memory sizes or improving retrieval speed.
- You can also save your corpus embeddings as a faiss index, which wasn't possible with the exact search originally.
- Check out how to evaluate dense retrieval using a faiss index [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_faiss_dense.py)] and dimension reduction using PCA [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_dim_reduction.py)].
Multilingual Datasets and Evaluation
- Thanks to julian-risch, we have added our first multilingual dataset to the BEIR repository - GermanQuAD (German SQuAD dataset).
- We have changed Elasticsearch now to allow evaluation on languages apart from English, check it out [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_multilingual_bm25.py)].
- We also have added a DPR model class which lets you load DPR models from Huggingface Repo, you can use this Class now for evaluation let's say the GermanDPR model [[link](https://huggingface.co/deepset/gbert-base-germandpr-ctx_encoder)].
DeepCT evaluation
- We have transformed the original DeepCT code to be able to use tensorflow (tf) >v2.0 and now hosted the latest repo [[here](https://github.com/NThakur20/DeepCT)].
- Using the hosted code, we are now able to use DeepCT for evaluation in BEIR using Anserini Retrieval, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/sparse/evaluate_deepct.py)].
Training Latest MSMARCO v3 Models
- From the [SentenceTransformers](https://github.com/UKPLab/sentence-transformers) repository, we have integrated the latest training code for MSMARCO on custom manually provided hard negatives. This provides the state-of-the-art SBERT models trained on MSMARCO, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3.py)].
Using Multiple-GPU for question-generation
- A big challenge was to use multiple GPUs for the generation of questions much faster. We have included Process-pools to generate questions much faster and now using multiple GPUs also in parallel, check [[here](https://github.com/UKPLab/beir/blob/main/examples/generation/query_gen_multi_gpu.py)].
Integration of Binary Passage Retrievers (BPR)
- BPR (ACL'21, [link](https://arxiv.org/abs/2106.00882)) is now integrated within the BEIR benchmark. Now you can easily train a state-of-the-art BPR model on MSMARCO using the loss function described in the original paper, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/training/train_msmarco_v3_bpr.py)].
- You can also evaluate BPR now easily now in a zero-shot evaluation fashion, check [[here](https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/dense/evaluate_bpr.py)].
- We would soon open-source the BPR public models trained on MSMARCO.