Feedback-Guided Extraction of Knowledge Base from Retrieval-Augmented LLM Applications
Changyue Jiang, Xudong Pan, Geng Hong, Chenfu Bao, Yang Chen, Min Yang
TL;DR
The paper addresses the risk of verbatim knowledge-base leakage in Retrieval-Augmented Generation (RAG) systems and introduces CopyBreakRAG, an autonomous agent-based black-box attack that uses feedback-driven curiosity and reasoning-guided exploitation to progressively extract knowledge chunks. By combining adversarial query design, robust chunk extraction via prompts/regex, and memory-driven expansion, it achieves over 70% extraction on real-world platforms and outperforms state-of-the-art baselines by up to ~45%. The work demonstrates strong cross-domain effectiveness, analyzes untargeted and targeted scenarios, and provides ablations and defense considerations, underscoring significant copyright and security implications for RAG deployments. It further discusses potential mitigations, including retrieval thresholds, output filtering, and prompt-safety mechanisms, highlighting the need for robust protections in commercial LLM-enabled applications.
Abstract
Retrieval-Augmented Generation (RAG) expands the knowledge boundary of large language models (LLMs) by integrating external knowledge bases, whose construction is often time-consuming and laborious. If an adversary extracts the knowledge base verbatim, it not only severely infringes the owner's intellectual property but also enables the adversary to replicate the application's functionality for unfair competition. Previous works on knowledge base extraction are limited either by low extraction coverage (usually less than 4%) in query-based attacks or by impractical assumptions of white-box access in embedding-based optimization methods. In this work, we propose CopyBreakRAG, an agent-based black-box attack that reasons from feedback and adaptively generates new adversarial queries for progressive extraction. By balancing exploration and exploitation through curiosity-driven queries and feedback-guided query refinement, our method overcomes the limitations of prior approaches and achieves significantly higher extraction coverage in realistic black-box settings. Experimental results show that CopyBreakRAG outperforms the state-of-the-art black-box approach by 45% on average in terms of chunk extraction ratio from applications built with mainstream RAG frameworks, and extracts over 70% of the data from the knowledge base in applications on commercial platforms including OpenAI's GPTs and ByteDance's Coze when essential protection is in place.
