Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

Pengcheng Zhou; Yinglun Feng; Zhongliang Yang

Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

Pengcheng Zhou, Yinglun Feng, Zhongliang Yang

TL;DR

The paper tackles privacy risks in Retrieval-Augmented Generation (RAG) by introducing an encryption-first framework that protects both textual content and embeddings. It presents two schemes: Method A, AES-CBC-based encryption with per-user keys $K_i$, and Method B, a chained dynamic key derivation approach with root keys $K_{A,i}$ and hash-based integrity, all under a user-isolated access model. Security proofs connect Method A to IND-CPA confidentiality and INT-CTXT via HMAC, while Method B leverages HKDF-based forward security, chain integrity, and trapdoor secrecy, with security guarantees that scale when the parameter $ ext{lambda} \\ge 128$. The framework preserves RAG performance, supports cross-domain deployment, and advocates for stricter data-protection standards in AI-driven services.

Abstract

The widespread adoption of Retrieval-Augmented Generation (RAG) systems in real-world applications has heightened concerns about the confidentiality and integrity of their proprietary knowledge bases. These knowledge bases, which play a critical role in enhancing the generative capabilities of Large Language Models (LLMs), are increasingly vulnerable to breaches that could compromise sensitive information. To address these challenges, this paper proposes an advanced encryption methodology designed to protect RAG systems from unauthorized access and data leakage. Our approach encrypts both textual content and its corresponding embeddings prior to storage, ensuring that all data remains securely encrypted. This mechanism restricts access to authorized entities with the appropriate decryption keys, thereby significantly reducing the risk of unintended data exposure. Furthermore, we demonstrate that our encryption strategy preserves the performance and functionality of RAG pipelines, ensuring compatibility across diverse domains and applications. To validate the robustness of our method, we provide comprehensive security proofs that highlight its resilience against potential threats and vulnerabilities. These proofs also reveal limitations in existing approaches, which often lack robustness, adaptability, or reliance on open-source models. Our findings suggest that integrating advanced encryption techniques into the design and deployment of RAG systems can effectively enhance privacy safeguards. This research contributes to the ongoing discourse on improving security measures for AI-driven services and advocates for stricter data protection standards within RAG architectures.

Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

TL;DR

Abstract

Privacy-Aware RAG: Secure and Isolated Knowledge Retrieval

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)