Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning
Junfeng Guo, Yiming Li, Ruibo Chen, Yihan Wu, Chenxi Liu, Yanshuo Chen, Heng Huang
TL;DR
This work tackles copyright protection for retrieval-augmented knowledge bases by shifting watermarking from final outputs to the chain-of-thought (CoT) reasoning process, creating a harmless verification mechanism. The proposed RAG$^{\scriptsize \raisebox{.5pt}{\textcircled{C}}}$ framework generates two innocent CoTs per verification item, jointly optimizes watermark phrases and target CoTs to ensure retriever-based retrieval favors watermarked content, and then verifies ownership via a pairwise Wilcoxon test in black-box, text-only settings. The approach demonstrates strong watermarking effectiveness (high verification success rates) and low harm to benign outputs, with extensive experiments across multiple benchmarks and real-world knowledge bases, and shows resilience to adaptive attacks. Overall, the method provides a practical, privacy-preserving means to detect unauthorized use of copyrighted knowledge bases in RA-LLMs and offers solid theoretical and empirical support for its robustness and applicability.
Abstract
Large language models (LLMs) are increasingly integrated into real-world personalized applications through retrieval-augmented generation (RAG) mechanisms to supplement their responses with domain-specific knowledge. However, the valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries. Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks. However, these methods require altering the LLM's results of verification samples, inevitably making these watermarks susceptible to anomaly detection and even introducing new security risks. To address these challenges, we propose \name{} for `harmless' copyright protection of knowledge bases. Instead of manipulating LLM's final output, \name{} implants distinct yet benign verification behaviors in the space of chain-of-thought (CoT) reasoning, maintaining the correctness of the final answer. Our method has three main stages: (1) Generating CoTs: For each verification question, we generate two `innocent' CoTs, including a target CoT for building watermark behaviors; (2) Optimizing Watermark Phrases and Target CoTs: Inspired by our theoretical analysis, we optimize them to minimize retrieval errors under the \emph{black-box} and \emph{text-only} setting of suspicious LLM, ensuring that only watermarked verification queries can retrieve their correspondingly target CoTs contained in the knowledge base; (3) Ownership Verification: We exploit a pairwise Wilcoxon test to verify whether a suspicious LLM is augmented with the protected knowledge base by comparing its responses to watermarked and benign verification queries. Our experiments on diverse benchmarks demonstrate that \name{} effectively protects knowledge bases and its resistance to adaptive attacks.
