Table of Contents
Fetching ...

Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft

Peiyang Liu, Ziqiang Cui, Di Liang, Wei Ye

TL;DR

This work tackles unauthorized data use in retrieval-augmented generation by introducing the RAG Plagiarism Detection Dataset (RPD) and a dual-layer watermarking framework. The semantic (knowledge-based) and lexical (red-green token distribution) watermarks, combined with an Interrogator-Detective framework and statistical hypothesis testing, enable robust detection even under fact redundancy and adversarial evasion. Across extensive experiments, the dual-layer approach achieves near-perfect detection under diverse conditions, with an ablation study showing complementary strengths and a synergy that raises the barrier for evaders. The results support a practical, privacy-preserving paradigm for guarding IP in RAG-enabled AI systems while maintaining text quality and utility.

Abstract

Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) by mitigating hallucinations and outdated information issues, yet simultaneously facilitates unauthorized data appropriation at scale. This paper addresses this challenge through two key contributions. First, we introduce RPD, a novel dataset specifically designed for RAG plagiarism detection that encompasses diverse professional domains and writing styles, overcoming limitations in existing resources. Second, we develop a dual-layered watermarking system that embeds protection at both semantic and lexical levels, complemented by an interrogator-detective framework that employs statistical hypothesis testing on accumulated evidence. Extensive experimentation demonstrates our approach's effectiveness across varying query volumes, defense prompts, and retrieval parameters, while maintaining resilience against adversarial evasion techniques. This work establishes a foundational framework for intellectual property protection in retrieval-augmented AI systems.

Who Stole Your Data? A Method for Detecting Unauthorized RAG Theft

TL;DR

This work tackles unauthorized data use in retrieval-augmented generation by introducing the RAG Plagiarism Detection Dataset (RPD) and a dual-layer watermarking framework. The semantic (knowledge-based) and lexical (red-green token distribution) watermarks, combined with an Interrogator-Detective framework and statistical hypothesis testing, enable robust detection even under fact redundancy and adversarial evasion. Across extensive experiments, the dual-layer approach achieves near-perfect detection under diverse conditions, with an ablation study showing complementary strengths and a synergy that raises the barrier for evaders. The results support a practical, privacy-preserving paradigm for guarding IP in RAG-enabled AI systems while maintaining text quality and utility.

Abstract

Retrieval-augmented generation (RAG) enhances Large Language Models (LLMs) by mitigating hallucinations and outdated information issues, yet simultaneously facilitates unauthorized data appropriation at scale. This paper addresses this challenge through two key contributions. First, we introduce RPD, a novel dataset specifically designed for RAG plagiarism detection that encompasses diverse professional domains and writing styles, overcoming limitations in existing resources. Second, we develop a dual-layered watermarking system that embeds protection at both semantic and lexical levels, complemented by an interrogator-detective framework that employs statistical hypothesis testing on accumulated evidence. Extensive experimentation demonstrates our approach's effectiveness across varying query volumes, defense prompts, and retrieval parameters, while maintaining resilience against adversarial evasion techniques. This work establishes a foundational framework for intellectual property protection in retrieval-augmented AI systems.

Paper Structure

This paper contains 60 sections, 2 theorems, 22 equations, 18 figures, 4 tables.

Key Result

theorem 1

For a watermarked text sequence of $T$ tokens generated with bias strength $\delta$ and green list proportion $\gamma$, if the average spike entropy of the sequence is at least $S^*$, then: Furthermore, the variance of the green token count is bounded by: where the spike entropy $S(p,z)$ of a probability distribution $p$ with modulus $z$ is defined as:

Figures (18)

  • Figure 1: Overview of our proposed pipeline for preventing unauthorized RAG misuse. First, we apply our Dual-Layered Watermarking method to embed watermarks into the documents of the protected dataset. When a Thief RAG System illicitly accesses our dataset, the watermark propagates through the RAG pipeline into the LLM's generated outputs. Although the watermark inevitably weakens during propagation, in the detection phase, the Interrogator strategically crafts queries that increase the likelihood of the RAG System retrieving watermarked documents. By submitting multiple queries to the Thief RAG System, the aggregated watermark signals accumulate to a detectable level, enabling statistical hypothesis testing to determine whether unauthorized misuse has occurred.
  • Figure 2: The construction flowchart of the RPD dataset. The source data is derived from the repliqa dataset, ensuring that the original data does not appear in the training samples of LLMs to prevent data leakage. For each piece of data in the repliqa dataset, we extract Facts and Relations. Based on these facts and relations, we select a unique author role from an "Author Pool" composed of writers with diverse styles. Different LLMs then assume these distinct author roles to select Facts and Relations for writing articles. This approach effectively simulates the data redundancy issues caused by authors with varying styles in real-world scenarios. Articles written based on true facts and relations will not conflict with the knowledge already learned by LLMs, and the newly generated data can avoid data leakage.
  • Figure 3: Experimental results on the robustness detection of red-green watermark and Fact-based watermark. Using a z-test $(\alpha = 0.005)$.
  • Figure 4: Text quality assessment across different watermarking approaches.
  • Figure 5: The impact of delta variation on text quality and detection accuracy.
  • ...and 13 more figures

Theorems & Definitions (2)

  • theorem 1
  • theorem 2