Table of Contents
Fetching ...

C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System

Parker Addison, Minh-Tuan H. Nguyen, Tomislav Medan, Jinali Shah, Mohammad T. Manzari, Brendan McElrone, Laksh Lalwani, Aboli More, Smita Sharma, Holger R. Roth, Isaac Yang, Chester Chen, Daguang Xu, Yan Cheng, Andrew Feng, Ziyue Xu

TL;DR

This work tackles the challenge of secure, scalable knowledge extraction with LLMs across decentralized data silos by introducing Confidential FedRAG (C-FedRAG). The approach combines a federated Retrieval-Augmented Generation pipeline with Confidential Computing to protect context data during embedding, retrieval, and inference, implemented on NVIDIA FLARE and evaluated using the MedRAG toolkit and MIRAGE benchmark. Empirical results on medical QA show that cross-site retrieval and re-ranking improve performance beyond centralized baselines, highlighting the practical viability of decentralized, privacy-preserving LLM workflows in enterprise settings. Overall, C-FedRAG enables scalable, privacy-conscious collaboration across organizations, unlocking broader data utility for LLM-based knowledge tasks.

Abstract

Organizations seeking to utilize Large Language Models (LLMs) for knowledge querying and analysis often encounter challenges in maintaining an LLM fine-tuned on targeted, up-to-date information that keeps answers relevant and grounded. Retrieval Augmented Generation (RAG) has quickly become a feasible solution for organizations looking to overcome the challenges of maintaining proprietary models and to help reduce LLM hallucinations in their query responses. However, RAG comes with its own issues regarding scaling data pipelines across tiered-access and disparate data sources. In many scenarios, it is necessary to query beyond a single data silo to provide richer and more relevant context for an LLM. Analyzing data sources within and across organizational trust boundaries is often limited by complex data-sharing policies that prohibit centralized data storage, therefore, inhibit the fast and effective setup and scaling of RAG solutions. In this paper, we introduce Confidential Computing (CC) techniques as a solution for secure Federated Retrieval Augmented Generation (FedRAG). Our proposed Confidential FedRAG system (C-FedRAG) enables secure connection and scaling of a RAG workflows across a decentralized network of data providers by ensuring context confidentiality. We also demonstrate how to implement a C-FedRAG system using the NVIDIA FLARE SDK and assess its performance using the MedRAG toolkit and MIRAGE benchmarking dataset.

C-FedRAG: A Confidential Federated Retrieval-Augmented Generation System

TL;DR

This work tackles the challenge of secure, scalable knowledge extraction with LLMs across decentralized data silos by introducing Confidential FedRAG (C-FedRAG). The approach combines a federated Retrieval-Augmented Generation pipeline with Confidential Computing to protect context data during embedding, retrieval, and inference, implemented on NVIDIA FLARE and evaluated using the MedRAG toolkit and MIRAGE benchmark. Empirical results on medical QA show that cross-site retrieval and re-ranking improve performance beyond centralized baselines, highlighting the practical viability of decentralized, privacy-preserving LLM workflows in enterprise settings. Overall, C-FedRAG enables scalable, privacy-conscious collaboration across organizations, unlocking broader data utility for LLM-based knowledge tasks.

Abstract

Organizations seeking to utilize Large Language Models (LLMs) for knowledge querying and analysis often encounter challenges in maintaining an LLM fine-tuned on targeted, up-to-date information that keeps answers relevant and grounded. Retrieval Augmented Generation (RAG) has quickly become a feasible solution for organizations looking to overcome the challenges of maintaining proprietary models and to help reduce LLM hallucinations in their query responses. However, RAG comes with its own issues regarding scaling data pipelines across tiered-access and disparate data sources. In many scenarios, it is necessary to query beyond a single data silo to provide richer and more relevant context for an LLM. Analyzing data sources within and across organizational trust boundaries is often limited by complex data-sharing policies that prohibit centralized data storage, therefore, inhibit the fast and effective setup and scaling of RAG solutions. In this paper, we introduce Confidential Computing (CC) techniques as a solution for secure Federated Retrieval Augmented Generation (FedRAG). Our proposed Confidential FedRAG system (C-FedRAG) enables secure connection and scaling of a RAG workflows across a decentralized network of data providers by ensuring context confidentiality. We also demonstrate how to implement a C-FedRAG system using the NVIDIA FLARE SDK and assess its performance using the MedRAG toolkit and MIRAGE benchmarking dataset.

Paper Structure

This paper contains 27 sections, 3 figures, 2 tables, 1 algorithm.

Figures (3)

  • Figure 1: RAG pipeline illustration
  • Figure 2: Confidential Federated RAG pipeline
  • Figure 3: C-FedRAG implementation