Table of Contents
Fetching ...

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs

Yepeng Liu, Xuandong Zhao, Dawn Song, Yuheng Bu

TL;DR

This work addresses IP protection in Retrieval-Augmented Generation by introducing Dataset Membership Inference for RAG (DMI-RAG), which embeds a small set of watermarked canary documents into an IP dataset without altering the originals. Canary canaries are synthesized to match IP attributes, using a watermarked LLM to implant an invisible watermark, enabling black-box detection via statistical analysis of responses to specially crafted queries. The method achieves high detectability (e.g., 100% ROC-AUC with modest query budgets) and near-zero dataset distortion, while preserving downstream task performance. Empirically, it demonstrates robust detection under hard prompts and low-entropy datasets, highlighting practical viability for data provenance and IP protection in RA-LLMs.

Abstract

Retrieval-Augmented Generation (RAG) has become an effective method for enhancing large language models (LLMs) with up-to-date knowledge. However, it poses a significant risk of IP infringement, as IP datasets may be incorporated into the knowledge database by malicious Retrieval-Augmented LLMs (RA-LLMs) without authorization. To protect the rights of the dataset owner, an effective dataset membership inference algorithm for RA-LLMs is needed. In this work, we introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs. Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset. These canary documents are created with synthetic content and embedded watermarks to ensure uniqueness, stealthiness, and statistical provability. During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs for statistical evidence of the embedded watermark. Our experimental results demonstrate high query efficiency, detectability, and stealthiness, along with minimal perturbation to the original dataset, all without compromising the performance of the RAG system.

Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs

TL;DR

This work addresses IP protection in Retrieval-Augmented Generation by introducing Dataset Membership Inference for RAG (DMI-RAG), which embeds a small set of watermarked canary documents into an IP dataset without altering the originals. Canary canaries are synthesized to match IP attributes, using a watermarked LLM to implant an invisible watermark, enabling black-box detection via statistical analysis of responses to specially crafted queries. The method achieves high detectability (e.g., 100% ROC-AUC with modest query budgets) and near-zero dataset distortion, while preserving downstream task performance. Empirically, it demonstrates robust detection under hard prompts and low-entropy datasets, highlighting practical viability for data provenance and IP protection in RA-LLMs.

Abstract

Retrieval-Augmented Generation (RAG) has become an effective method for enhancing large language models (LLMs) with up-to-date knowledge. However, it poses a significant risk of IP infringement, as IP datasets may be incorporated into the knowledge database by malicious Retrieval-Augmented LLMs (RA-LLMs) without authorization. To protect the rights of the dataset owner, an effective dataset membership inference algorithm for RA-LLMs is needed. In this work, we introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs. Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset. These canary documents are created with synthetic content and embedded watermarks to ensure uniqueness, stealthiness, and statistical provability. During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs for statistical evidence of the embedded watermark. Our experimental results demonstrate high query efficiency, detectability, and stealthiness, along with minimal perturbation to the original dataset, all without compromising the performance of the RAG system.

Paper Structure

This paper contains 20 sections, 6 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Overview of our DMI-RAG method. In the dataset protection stage, we generate watermarked and synthetic canary documents based on the attributes of the documents from the IP dataset to form a protected dataset. A malicious RAG system may integrate an IP dataset into its base dataset without obtaining permission from the data owner. During the detection stage, the data owner can conduct black-box queries (without logits information) targeting these canary documents using specifically crafted questions and analyze the model responses to detect the presence of the watermark.
  • Figure 2: Workflow of our canary dataset synthesis algorithm. The process begins by randomly sampling a document from the IP dataset to serve as a reference. Next, key attributes are extracted from the reference document. Using these attributes, the descriptions and relationships between synthetic entities are created. Finally, the algorithm outputs the synthetic text and a corresponding query question.
  • Figure 3: Retrieval Accuracy for different methods. Our method achieves $100\%$ Retrieval Accuracy.
  • Figure 4: Distribution of quality ratings using QuRating across four different aspects for various methods.