Table of Contents
Fetching ...

Feedback-Guided Extraction of Knowledge Base from Retrieval-Augmented LLM Applications

Changyue Jiang, Xudong Pan, Geng Hong, Chenfu Bao, Yang Chen, Min Yang

TL;DR

The paper addresses the risk of verbatim knowledge-base leakage in Retrieval-Augmented Generation (RAG) systems and introduces CopyBreakRAG, an autonomous agent-based black-box attack that uses feedback-driven curiosity and reasoning-guided exploitation to progressively extract knowledge chunks. By combining adversarial query design, robust chunk extraction via prompts/regex, and memory-driven expansion, it achieves over 70% extraction on real-world platforms and outperforms state-of-the-art baselines by up to ~45%. The work demonstrates strong cross-domain effectiveness, analyzes untargeted and targeted scenarios, and provides ablations and defense considerations, underscoring significant copyright and security implications for RAG deployments. It further discusses potential mitigations, including retrieval thresholds, output filtering, and prompt-safety mechanisms, highlighting the need for robust protections in commercial LLM-enabled applications.

Abstract

Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) by integrating external knowledge bases, whose construction is often time-consuming and laborious. If an adversary extracts the knowledge base verbatim, it not only severely infringes the owner's intellectual property but also enables the adversary to replicate the application's functionality for unfair competition. Previous works on knowledge base extraction are limited either by low extraction coverage (usually less than 4%) in query-based attacks or by impractical assumptions of white-box access in embedding-based optimization methods. In this work, we propose CopyBreakRAG, an agent-based black-box attack that reasons from feedback and adaptively generates new adversarial queries for progressive extraction. By balancing exploration and exploitation through curiosity-driven queries and feedback-guided query refinement, our method overcomes the limitations of prior approaches and achieves significantly higher extraction coverage in realistic black-box settings. Experimental results show that CopyBreakRAG outperforms the state-of-the-art black-box approach by 45% on average in terms of chunk extraction ratio from applications built with mainstream RAG frameworks, and extracts over 70% of the data from the knowledge base in applications on commercial platforms including OpenAI's GPTs and ByteDance's Coze when essential protection is in place.

Feedback-Guided Extraction of Knowledge Base from Retrieval-Augmented LLM Applications

TL;DR

The paper addresses the risk of verbatim knowledge-base leakage in Retrieval-Augmented Generation (RAG) systems and introduces CopyBreakRAG, an autonomous agent-based black-box attack that uses feedback-driven curiosity and reasoning-guided exploitation to progressively extract knowledge chunks. By combining adversarial query design, robust chunk extraction via prompts/regex, and memory-driven expansion, it achieves over 70% extraction on real-world platforms and outperforms state-of-the-art baselines by up to ~45%. The work demonstrates strong cross-domain effectiveness, analyzes untargeted and targeted scenarios, and provides ablations and defense considerations, underscoring significant copyright and security implications for RAG deployments. It further discusses potential mitigations, including retrieval thresholds, output filtering, and prompt-safety mechanisms, highlighting the need for robust protections in commercial LLM-enabled applications.

Abstract

Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) by integrating external knowledge bases, whose construction is often time-consuming and laborious. If an adversary extracts the knowledge base verbatim, it not only severely infringes the owner's intellectual property but also enables the adversary to replicate the application's functionality for unfair competition. Previous works on knowledge base extraction are limited either by low extraction coverage (usually less than 4%) in query-based attacks or by impractical assumptions of white-box access in embedding-based optimization methods. In this work, we propose CopyBreakRAG, an agent-based black-box attack that reasons from feedback and adaptively generates new adversarial queries for progressive extraction. By balancing exploration and exploitation through curiosity-driven queries and feedback-guided query refinement, our method overcomes the limitations of prior approaches and achieves significantly higher extraction coverage in realistic black-box settings. Experimental results show that CopyBreakRAG outperforms the state-of-the-art black-box approach by 45% on average in terms of chunk extraction ratio from applications built with mainstream RAG frameworks, and extracts over 70% of the data from the knowledge base in applications on commercial platforms including OpenAI's GPTs and ByteDance's Coze when essential protection is in place.

Paper Structure

This paper contains 33 sections, 7 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Attack scenario of CopyBreakRAG and demonstration on a real-world healthcare-related RAG application from OpenAI GPTs (For ethical reasons, the GPT is created by the authors and only contains public data).
  • Figure 2: Overview of CopyBreakRAG. First, ❶ CopyBreakRAG uses an adversarial query from the attack queue to induce the RAG application to ❷ extract specific chunks. Then CopyBreakRAG stores these chunks in the short-term memory, and employs ❸ curiosity-driven exploration and ❹ reasoning-based exploitation to heuristically generate multiple anchor queries for each chunk based on an attack LLM. These anchor questions are then ❺ concatenated with the adversarial command to form new adversarial queries for the next round of attacks. The extracted chunks are subsequently stored as the agent's long-term memory, with duplicates excluded from storage.
  • Figure 3: A schematic diagram of the exploration and the exploitation phase of our attack on the semantic space.
  • Figure 4: Growth in CRR of CopyBreakRAG and the baselines PIDE and DGEA within the same attack budget.
  • Figure 5: The CRR curve of CopyBreakRAG attacks in both targeted and untargeted scenarios with changes in (a) the number of retrieved chunks, (b) the agent base model size, and (c) the retrieval mode in the RAG applications, (d) the number of random queries.
  • ...and 1 more figures