Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
TL;DR
Problem: Watermarking for AI detection can degrade conditional text generation (CTG) performance. Approach: Introduces a semantic-aware watermarking (SW) that uses input context and embedding-based token similarity to form the green list, combined with a hash-based partition for the remaining vocabulary; uses CTG-specific hyperparameters $\gamma$, $\delta$, and $k$. Results: SW yields substantial gains in CTG quality over the original watermark across summarization (CNN/DailyMail, XSUM) and data-to-text (DART, WebNLG) on BART and Flan-T5, while maintaining detection signals; human judges prefer SW in controlled evaluations. Significance: Demonstrates a practical, task-aware watermarking approach enabling reliable AI-detection in CTG without sacrificing output quality, with broad applicability to real-world datasets.
Abstract
To mitigate potential risks associated with language models, recent AI detection research proposes incorporating watermarks into machine-generated text through random vocabulary restrictions and utilizing this information for detection. While these watermarks only induce a slight deterioration in perplexity, our empirical investigation reveals a significant detriment to the performance of conditional text generation. To address this issue, we introduce a simple yet effective semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context. Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models, including BART and Flan-T5, in tasks such as summarization and data-to-text generation while maintaining detection ability.
