Table of Contents
Fetching ...

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

Yu He, Yifei Chen, Yiming Li, Shuo Shao, Leyi Qi, Boheng Li, Dacheng Tao, Zhan Qin

TL;DR

This work formalizes external data extraction attacks (EDEAs) against retrieval-augmented LLMs and introduces SECRET, a scalable attack combining LLM-driven jailbreak optimization and cluster-focused triggering to extract documents from private knowledge bases. The authors define a unified three-component framework—extraction instruction $p_e$, jailbreak operator $\\mathcal{J}(\cdot)$, and retrieval trigger $t_i$—and show how prior attacks fit within it, enabling systematic evaluation. SECRET uses an adaptive jailbreak prompt optimizer and a cluster-focused trigger strategy (global exploration plus local exploitation) to achieve high extraction rates across 16 RA-LLMs and two sensitive datasets, including a notable 35% extraction from Claude 3.7 Sonnet-powered RA-LLMs in their tests. Even with naive defenses, SECRET remains robust, highlighting an urgent need for stronger defenses against data leakage in RAG deployments. The work provides a rigorous framework, extensive empirical results, and a candid discussion of defenses, security implications, and future directions for protecting private data in RA-LLMs.

Abstract

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

External Data Extraction Attacks against Retrieval-Augmented Large Language Models

TL;DR

This work formalizes external data extraction attacks (EDEAs) against retrieval-augmented LLMs and introduces SECRET, a scalable attack combining LLM-driven jailbreak optimization and cluster-focused triggering to extract documents from private knowledge bases. The authors define a unified three-component framework—extraction instruction , jailbreak operator , and retrieval trigger —and show how prior attacks fit within it, enabling systematic evaluation. SECRET uses an adaptive jailbreak prompt optimizer and a cluster-focused trigger strategy (global exploration plus local exploitation) to achieve high extraction rates across 16 RA-LLMs and two sensitive datasets, including a notable 35% extraction from Claude 3.7 Sonnet-powered RA-LLMs in their tests. Even with naive defenses, SECRET remains robust, highlighting an urgent need for stronger defenses against data leakage in RAG deployments. The work provides a rigorous framework, extensive empirical results, and a candid discussion of defenses, security implications, and future directions for protecting private data in RA-LLMs.

Abstract

In recent years, RAG has emerged as a key paradigm for enhancing large language models (LLMs). By integrating externally retrieved information, RAG alleviates issues like outdated knowledge and, crucially, insufficient domain expertise. While effective, RAG introduces new risks of external data extraction attacks (EDEAs), where sensitive or copyrighted data in its knowledge base may be extracted verbatim. These risks are particularly acute when RAG is used to customize specialized LLM applications with private knowledge bases. Despite initial studies exploring these risks, they often lack a formalized framework, robust attack performance, and comprehensive evaluation, leaving critical questions about real-world EDEA feasibility unanswered. In this paper, we present the first comprehensive study to formalize EDEAs against retrieval-augmented LLMs. We first formally define EDEAs and propose a unified framework decomposing their design into three components: extraction instruction, jailbreak operator, and retrieval trigger, under which prior attacks can be considered instances within our framework. Guided by this framework, we develop SECRET: a Scalable and EffeCtive exteRnal data Extraction aTtack. Specifically, SECRET incorporates (1) an adaptive optimization process using LLMs as optimizers to generate specialized jailbreak prompts for EDEAs, and (2) cluster-focused triggering, an adaptive strategy that alternates between global exploration and local exploitation to efficiently generate effective retrieval triggers. Extensive evaluations across 4 models reveal that SECRET significantly outperforms previous attacks, and is highly effective against all 16 tested RAG instances. Notably, SECRET successfully extracts 35% of the data from RAG powered by Claude 3.7 Sonnet for the first time, whereas other attacks yield 0% extraction. Our findings call for attention to this emerging threat.

Paper Structure

This paper contains 39 sections, 2 theorems, 16 equations, 12 figures, 11 tables, 4 algorithms.

Key Result

Lemma I.1

Under the stated assumptions, any query generated by the LE operator $\mathcal{L}$ from a source within a cluster $C_m$ will only retrieve documents from $C_m$. The number of queries required to extract all $V$ documents from the cluster, denoted $Q_{\text{LE}}$, satisfies

Figures (12)

  • Figure 1: The pipeline of Retrieval-Augmented Generation.
  • Figure 2: Overall attack pipeline of our Secret.
  • Figure 3: Evolution of extraction rates of Secret under different $k$. The ER-TMQ results are annotated in the figure.
  • Figure 4: The effectiveness of Secret after varying the retrieval settings. The dataset is HealthcareMagic-101 li2023chatdoctor.
  • Figure 5: The effectiveness of Secret under different $\tau$. The dataset is HealthcareMagic-101.
  • ...and 7 more figures

Theorems & Definitions (6)

  • Definition 3.1: Successful Extraction
  • Definition 3.2: EDEAs
  • Lemma I.1: Single-Cluster Exploitation Complexity
  • proof
  • Theorem I.1: Optimal Strategy Transition for Document Extraction
  • proof