Table of Contents
Fetching ...

FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation

Dongfang Zhao

TL;DR

Federated Retrieval-Augmented Generation employs a single-key homomorphic encryption protocol that simplifies key management across mutually-distrusted parties and introduces a multiplier caching technique to efficiently encrypt floating-point numbers, significantly improving computational performance in large-scale federated environments.

Abstract

This paper introduces \textit{Federated Retrieval-Augmented Generation (FRAG)}, a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems, which are increasingly powered by large-language models (LLMs). FRAG enables mutually-distrusted parties to collaboratively perform Approximate $k$-Nearest Neighbor (ANN) searches on encrypted query vectors and encrypted data stored in distributed vector databases, all while ensuring that no party can gain any knowledge about the queries or data of others. Achieving this paradigm presents two key challenges: (i) ensuring strong security guarantees, such as Indistinguishability under Chosen-Plaintext Attack (IND-CPA), under practical assumptions (e.g., we avoid overly optimistic assumptions like non-collusion among parties); and (ii) maintaining performance overheads comparable to traditional, non-federated RAG systems. To address these challenges, FRAG employs a single-key homomorphic encryption protocol that simplifies key management across mutually-distrusted parties. Additionally, FRAG introduces a \textit{multiplicative caching} technique to efficiently encrypt floating-point numbers, significantly improving computational performance in large-scale federated environments. We provide a rigorous security proof using standard cryptographic reductions and demonstrate the practical scalability and efficiency of FRAG through extensive experiments on both benchmark and real-world datasets.

FRAG: Toward Federated Vector Database Management for Collaborative and Secure Retrieval-Augmented Generation

TL;DR

Federated Retrieval-Augmented Generation employs a single-key homomorphic encryption protocol that simplifies key management across mutually-distrusted parties and introduces a multiplier caching technique to efficiently encrypt floating-point numbers, significantly improving computational performance in large-scale federated environments.

Abstract

This paper introduces \textit{Federated Retrieval-Augmented Generation (FRAG)}, a novel database management paradigm tailored for the growing needs of retrieval-augmented generation (RAG) systems, which are increasingly powered by large-language models (LLMs). FRAG enables mutually-distrusted parties to collaboratively perform Approximate -Nearest Neighbor (ANN) searches on encrypted query vectors and encrypted data stored in distributed vector databases, all while ensuring that no party can gain any knowledge about the queries or data of others. Achieving this paradigm presents two key challenges: (i) ensuring strong security guarantees, such as Indistinguishability under Chosen-Plaintext Attack (IND-CPA), under practical assumptions (e.g., we avoid overly optimistic assumptions like non-collusion among parties); and (ii) maintaining performance overheads comparable to traditional, non-federated RAG systems. To address these challenges, FRAG employs a single-key homomorphic encryption protocol that simplifies key management across mutually-distrusted parties. Additionally, FRAG introduces a \textit{multiplicative caching} technique to efficiently encrypt floating-point numbers, significantly improving computational performance in large-scale federated environments. We provide a rigorous security proof using standard cryptographic reductions and demonstrate the practical scalability and efficiency of FRAG through extensive experiments on both benchmark and real-world datasets.

Paper Structure

This paper contains 67 sections, 2 theorems, 5 equations, 8 figures, 2 algorithms.

Key Result

theorem 1

The SK-MHE protocol is IND-CPA secure, assuming that the underlying homomorphic encryption scheme is IND-CPA secure.

Figures (8)

  • Figure 1: FRAG Architecture
  • Figure 2: Performance Breakdown of cryptographic primitives on MNIST, FMNIST, CIFAR-10, and SVHN.
  • Figure 3: Overhead of SK-MHE compared to FedAvg on MNIST, FMNIST, CIFAR-10, and SVHN (time in seconds).
  • Figure 4: Computational cost of multiplicative caching algorithms in MySQL loadable functions.
  • Figure 5: Cost of different thread counts for caching ciphertexts.
  • ...and 3 more figures

Theorems & Definitions (2)

  • theorem 1
  • theorem 2