fastRAG 2.0 includes new highly-anticipated efficiency-oriented components, an updated chat-like demo experience with multi-modality and improvements to existing components.
The library now utilizes efficient **Intel optimizations** using [Intel extensions for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch), [🤗 Optimum Intel](https://github.com/huggingface/optimum-intel) and [🤗 Optimum-Habana](https://github.com/huggingface/optimum-habana) for _running as optimal as possible_ on Intel® Xeon® Processors and Intel® Gaudi® AI accelerators.
:rocket: Intel Habana Gaudi 1 and Gaudi 2 Support
fastRAG is the first RAG framework to support Habana Gaudi accelerators for running LLMs efficiently; more details [here](https://github.com/IntelLabs/fastRAG/blob/main/components.md#fastrag-running-llms-with-habana-gaudi-dl1-and-gaudi-2).
:cyclone: Running LLMs with the ONNX Runtime and LlamaCPP Backends
Added support to run quantized LLMs on [ONNX runtime](https://github.com/IntelLabs/fastRAG/blob/main/components.md#fastrag-running-llms-with-onnx-runtime) and [LlamaCPP]() for higher efficiency and speed for all your RAG pipelines.
:zap: CPU Efficient Embedders
We added support running bi-encoder [embedders](https://github.com/IntelLabs/fastRAG/blob/main/scripts/optimizations/embedders/README.md) and [cross-encoder ranker](https://github.com/IntelLabs/fastRAG/blob/main/scripts/optimizations/reranker_quantization/quantization.md) as efficiently as possible on Intel CPUs using Intel optimized software.
We integrated the optimized embedders into the following two components:
- [`QuantizedBiEncoderRanker`](https://github.com/IntelLabs/fastRAG/blob/main/fastrag/rankers/quantized_bi_encoder.py) - bi-encoder rankers; encodes the documents provided in the input and re-orders according to query similarity.
- [`QuantizedBiEncoderRetriever`](https://github.com/IntelLabs/fastRAG/blob/main/fastrag/retrievers/optimized.py) - bi-encoder retriever; encodes documents into vectors given a vectors store engine.
:hourglass_flowing_sand: REPLUG
An implementation of [REPLUG](https://arxiv.org/abs/2301.12652), an advanced technique for ensemble prompting of retrieved documents, processing them in parallel and combining their next token predictions for better results.
:trophy: New Demos
We updated our demos (and [demo page](https://github.com/IntelLabs/fastRAG/blob/main/demo/README.md)) to include two new demos that depict a chat-like experience plus fusing **multi-modality** RAG.
:tropical_fish: Enhancements
* Added documentation for most **models and components**, containing examples and **notebooks** ready to run!
* Support for the **Fusion-in-Decoder** (FiD) model using a dedicated invocation layer.
* Various bug fixes and compatibility updates supporting the Haystack framework.
**Full Changelog**: https://github.com/IntelLabs/fastRAG/compare/v1.3.0...v2.0