Table of Contents
Fetching ...

Efficient Federated Search for Retrieval-Augmented Generation

Rachid Guerraoui, Anne-Marie Kermarrec, Diana Petrescu, Rafael Pires, Mathis Randl, Martijn de Vos

TL;DR

RAGRoute tackles the inefficiency of federated retrieval in RAG by introducing a lightweight shallow neural router that selectively queries among multiple data sources. By routing queries to a subset of sources, it reduces both the number of queries and data transfer without sacrificing end-to-end RAG accuracy, as demonstrated on MIRAGE and MMLU. The work details the router design, training and inference procedures, and a thorough evaluation showing up to 77.5% query reductions and 76.2% data transfer savings. This approach enables scalable, privacy-conscious federated RAG deployments across distributed knowledge bases with minimal impact on answer quality.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various domains but remain susceptible to hallucinations and inconsistencies, limiting their reliability. Retrieval-augmented generation (RAG) mitigates these issues by grounding model responses in external knowledge sources. Existing RAG workflows often leverage a single vector database, which is impractical in the common setting where information is distributed across multiple repositories. We introduce RAGRoute, a novel mechanism for federated RAG search. RAGRoute dynamically selects relevant data sources at query time using a lightweight neural network classifier. By not querying every data source, this approach significantly reduces query overhead, improves retrieval efficiency, and minimizes the retrieval of irrelevant information. We evaluate RAGRoute using the MIRAGE and MMLU benchmarks and demonstrate its effectiveness in retrieving relevant documents while reducing the number of queries. RAGRoute reduces the total number of queries up to 77.5% and communication volume up to 76.2%.

Efficient Federated Search for Retrieval-Augmented Generation

TL;DR

RAGRoute tackles the inefficiency of federated retrieval in RAG by introducing a lightweight shallow neural router that selectively queries among multiple data sources. By routing queries to a subset of sources, it reduces both the number of queries and data transfer without sacrificing end-to-end RAG accuracy, as demonstrated on MIRAGE and MMLU. The work details the router design, training and inference procedures, and a thorough evaluation showing up to 77.5% query reductions and 76.2% data transfer savings. This approach enables scalable, privacy-conscious federated RAG deployments across distributed knowledge bases with minimal impact on answer quality.

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities across various domains but remain susceptible to hallucinations and inconsistencies, limiting their reliability. Retrieval-augmented generation (RAG) mitigates these issues by grounding model responses in external knowledge sources. Existing RAG workflows often leverage a single vector database, which is impractical in the common setting where information is distributed across multiple repositories. We introduce RAGRoute, a novel mechanism for federated RAG search. RAGRoute dynamically selects relevant data sources at query time using a lightweight neural network classifier. By not querying every data source, this approach significantly reduces query overhead, improves retrieval efficiency, and minimizes the retrieval of irrelevant information. We evaluate RAGRoute using the MIRAGE and MMLU benchmarks and demonstrate its effectiveness in retrieving relevant documents while reducing the number of queries. RAGRoute reduces the total number of queries up to 77.5% and communication volume up to 76.2%.

Paper Structure

This paper contains 19 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The RAG workflow.
  • Figure 2: The relevance of different corpora in RAG when answering questions, using question sets from MIRAGE.
  • Figure 3: The workflow of RAGRoute. The components specific to RAGRoute are indicated in the box with the dashed border. In contrast to existing RAG workflows that rely on a single data store, RAGRoute enables efficient federated search by using a lightweight router to determine relevant data sources during an inference request.
  • Figure 4: The mean recall for both benchmarks and for different data sources. We also show the mean recall for RAGRoute.
  • Figure 5: The number of queries for both benchmarks and for different routing strategies.