Fastrag

Latest version: v3.1.2

Safety actively analyzes 714860 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

2.0

fastRAG 2.0 includes new highly-anticipated efficiency-oriented components, an updated chat-like demo experience with multi-modality and improvements to existing components.

The library now utilizes efficient **Intel optimizations** using [Intel extensions for PyTorch (IPEX)](https://github.com/intel/intel-extension-for-pytorch), [šŸ¤— Optimum Intel](https://github.com/huggingface/optimum-intel) and [šŸ¤— Optimum-Habana](https://github.com/huggingface/optimum-habana) for _running as optimal as possible_ on IntelĀ® XeonĀ® Processors and IntelĀ® GaudiĀ® AI accelerators.

:rocket: Intel Habana Gaudi 1 and Gaudi 2 Support
fastRAG is the first RAG framework to support Habana Gaudi accelerators for running LLMs efficiently; more details [here](https://github.com/IntelLabs/fastRAG/blob/main/components.md#fastrag-running-llms-with-habana-gaudi-dl1-and-gaudi-2).

:cyclone: Running LLMs with the ONNX Runtime and LlamaCPP Backends
Added support to run quantized LLMs on [ONNX runtime](https://github.com/IntelLabs/fastRAG/blob/main/components.md#fastrag-running-llms-with-onnx-runtime) and [LlamaCPP]() for higher efficiency and speed for all your RAG pipelines.

:zap: CPU Efficient Embedders
We added support running bi-encoder [embedders](https://github.com/IntelLabs/fastRAG/blob/main/scripts/optimizations/embedders/README.md) and [cross-encoder ranker](https://github.com/IntelLabs/fastRAG/blob/main/scripts/optimizations/reranker_quantization/quantization.md) as efficiently as possible on Intel CPUs using Intel optimized software.

We integrated the optimized embedders into the following two components:

- [`QuantizedBiEncoderRanker`](https://github.com/IntelLabs/fastRAG/blob/main/fastrag/rankers/quantized_bi_encoder.py) - bi-encoder rankers; encodes the documents provided in the input and re-orders according to query similarity.
- [`QuantizedBiEncoderRetriever`](https://github.com/IntelLabs/fastRAG/blob/main/fastrag/retrievers/optimized.py) - bi-encoder retriever; encodes documents into vectors given a vectors store engine.

:hourglass_flowing_sand: REPLUG
An implementation of [REPLUG](https://arxiv.org/abs/2301.12652), an advanced technique for ensemble prompting of retrieved documents, processing them in parallel and combining their next token predictions for better results.

:trophy: New Demos

We updated our demos (and [demo page](https://github.com/IntelLabs/fastRAG/blob/main/demo/README.md)) to include two new demos that depict a chat-like experience plus fusing **multi-modality** RAG.

:tropical_fish: Enhancements
* Added documentation for most **models and components**, containing examples and **notebooks** ready to run!
* Support for the **Fusion-in-Decoder** (FiD) model using a dedicated invocation layer.
* Various bug fixes and compatibility updates supporting the Haystack framework.

**Full Changelog**: https://github.com/IntelLabs/fastRAG/compare/v1.3.0...v2.0

1.3.0

What's Changed
* ColBERT Upstream Updates by danielfleischer in https://github.com/IntelLabs/fastRAG/pull/19


**Full Changelog**: https://github.com/IntelLabs/fastRAG/compare/v1.2.1...v1.3.0

1.2.1

What's Changed
* Update plaid_colbert_pipeline.ipynb by mosheber in https://github.com/IntelLabs/fastRAG/pull/17
* Update colbert.py by mosheber in https://github.com/IntelLabs/fastRAG/pull/18


**Full Changelog**: https://github.com/IntelLabs/fastRAG/compare/v1.2.0...v1.2.1

1.2.0

Page 2 of 2

Ā© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.