Table of Contents
Fetching ...

Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation

Yufei Chen, Yao Wang, Haibin Zhang, Tao Gu

TL;DR

The paper addresses privacy leakage in retrieval-augmented generation by exploiting knowledge asymmetry between RAG systems and standard LLMs. It introduces a black-box, three-phase attack framework that decomposes adversarial queries, uses NLI-powered similarity scoring, and applies a neural classifier to localize privacy-bearing sentences at sentence granularity, achieving strong cross-domain performance. Key findings show ESRs of ~90% in single-domain and ~80% in multi-domain settings, with F1 and AUC metrics indicating robust discrimination, and demonstrate the approach's potential to inform defense strategies such as privacy-preserving response generation. The work highlights critical privacy risks in knowledge-augmented models and provides a foundation for adaptive mitigations that balance utility and privacy across domains.

Abstract

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge bases, but this advancement introduces significant privacy risks. Existing privacy attacks on RAG systems can trigger data leakage but often fail to accurately isolate knowledge-base-derived sentences within mixed responses. They also lack robustness when applied across multiple domains. This paper addresses these challenges by presenting a novel black-box attack framework that exploits knowledge asymmetry between RAG and standard LLMs to achieve fine-grained privacy extraction across heterogeneous knowledge landscapes. We propose a chain-of-thought reasoning strategy that creates adaptive prompts to steer RAG systems away from sensitive content. Specifically, we first decompose adversarial queries to maximize information disparity and then apply a semantic relationship scoring to resolve lexical and syntactic ambiguities. We finally train a neural network on these feature scores to precisely identify sentences containing private information. Unlike prior work, our framework generalizes to unseen domains through iterative refinement without pre-defined knowledge. Experimental results show that we achieve over 91% privacy extraction rate in single-domain and 83% in multi-domain scenarios, reducing sensitive sentence exposure by over 65% in case studies. This work bridges the gap between attack and defense in RAG systems, enabling precise extraction of private information while providing a foundation for adaptive mitigation.

Fine-Grained Privacy Extraction from Retrieval-Augmented Generation Systems via Knowledge Asymmetry Exploitation

TL;DR

The paper addresses privacy leakage in retrieval-augmented generation by exploiting knowledge asymmetry between RAG systems and standard LLMs. It introduces a black-box, three-phase attack framework that decomposes adversarial queries, uses NLI-powered similarity scoring, and applies a neural classifier to localize privacy-bearing sentences at sentence granularity, achieving strong cross-domain performance. Key findings show ESRs of ~90% in single-domain and ~80% in multi-domain settings, with F1 and AUC metrics indicating robust discrimination, and demonstrate the approach's potential to inform defense strategies such as privacy-preserving response generation. The work highlights critical privacy risks in knowledge-augmented models and provides a foundation for adaptive mitigations that balance utility and privacy across domains.

Abstract

Retrieval-augmented generation (RAG) systems enhance large language models (LLMs) by integrating external knowledge bases, but this advancement introduces significant privacy risks. Existing privacy attacks on RAG systems can trigger data leakage but often fail to accurately isolate knowledge-base-derived sentences within mixed responses. They also lack robustness when applied across multiple domains. This paper addresses these challenges by presenting a novel black-box attack framework that exploits knowledge asymmetry between RAG and standard LLMs to achieve fine-grained privacy extraction across heterogeneous knowledge landscapes. We propose a chain-of-thought reasoning strategy that creates adaptive prompts to steer RAG systems away from sensitive content. Specifically, we first decompose adversarial queries to maximize information disparity and then apply a semantic relationship scoring to resolve lexical and syntactic ambiguities. We finally train a neural network on these feature scores to precisely identify sentences containing private information. Unlike prior work, our framework generalizes to unseen domains through iterative refinement without pre-defined knowledge. Experimental results show that we achieve over 91% privacy extraction rate in single-domain and 83% in multi-domain scenarios, reducing sensitive sentence exposure by over 65% in case studies. This work bridges the gap between attack and defense in RAG systems, enabling precise extraction of private information while providing a foundation for adaptive mitigation.

Paper Structure

This paper contains 27 sections, 2 equations, 29 figures, 7 tables, 1 algorithm.

Figures (29)

  • Figure 1: Comparison of our attack with existing work.
  • Figure 2: Proportion of private data in baseline attack outputs across datasets.
  • Figure 3: Workflow of our attack on RAG system.
  • Figure 4: Performance under different model size.
  • Figure 5: Performance under different temperature.
  • ...and 24 more figures