Table of Contents
Fetching ...

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

Yu Fu, Deyi Xiong, Yue Dong

TL;DR

Problem: Watermarking for AI detection can degrade conditional text generation (CTG) performance. Approach: Introduces a semantic-aware watermarking (SW) that uses input context and embedding-based token similarity to form the green list, combined with a hash-based partition for the remaining vocabulary; uses CTG-specific hyperparameters $\gamma$, $\delta$, and $k$. Results: SW yields substantial gains in CTG quality over the original watermark across summarization (CNN/DailyMail, XSUM) and data-to-text (DART, WebNLG) on BART and Flan-T5, while maintaining detection signals; human judges prefer SW in controlled evaluations. Significance: Demonstrates a practical, task-aware watermarking approach enabling reliable AI-detection in CTG without sacrificing output quality, with broad applicability to real-world datasets.

Abstract

To mitigate potential risks associated with language models, recent AI detection research proposes incorporating watermarks into machine-generated text through random vocabulary restrictions and utilizing this information for detection. While these watermarks only induce a slight deterioration in perplexity, our empirical investigation reveals a significant detriment to the performance of conditional text generation. To address this issue, we introduce a simple yet effective semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context. Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models, including BART and Flan-T5, in tasks such as summarization and data-to-text generation while maintaining detection ability.

Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy

TL;DR

Problem: Watermarking for AI detection can degrade conditional text generation (CTG) performance. Approach: Introduces a semantic-aware watermarking (SW) that uses input context and embedding-based token similarity to form the green list, combined with a hash-based partition for the remaining vocabulary; uses CTG-specific hyperparameters , , and . Results: SW yields substantial gains in CTG quality over the original watermark across summarization (CNN/DailyMail, XSUM) and data-to-text (DART, WebNLG) on BART and Flan-T5, while maintaining detection signals; human judges prefer SW in controlled evaluations. Significance: Demonstrates a practical, task-aware watermarking approach enabling reliable AI-detection in CTG without sacrificing output quality, with broad applicability to real-world datasets.

Abstract

To mitigate potential risks associated with language models, recent AI detection research proposes incorporating watermarks into machine-generated text through random vocabulary restrictions and utilizing this information for detection. While these watermarks only induce a slight deterioration in perplexity, our empirical investigation reveals a significant detriment to the performance of conditional text generation. To address this issue, we introduce a simple yet effective semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context. Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models, including BART and Flan-T5, in tasks such as summarization and data-to-text generation while maintaining detection ability.
Paper Structure (18 sections, 2 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 18 sections, 2 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: The outputs with the original watermark (OW) kirchenbauer2023watermark and our proposed semantic-aware watermark (SW) on a test example from DART -- a data-to-text generation benchmark -- with parameters $\gamma=0.1$ and $\delta=5$. We expect $\sim$ 90% of human-generated texts from the red list, whereas AI primarily utilizes the green list. Both watermarks yield high $z$-scores ($z>4$), indicating strong watermark strength for detection. Yet, OW forces the algorithm to generate from the red list due to randomly assigning key source entities (Mandy Patinkin) to it. As $\delta$ increases (towards a hard watermark), excluding these red tokens risks more hallucinations (words with underline).
  • Figure 2: Watermark detection: average $z$-score under different $\delta$ settings (x-axis). Higher $z$-scores indicate stronger watermark detection confidence. We can see that hard watermarks (greater $\delta$) are easier to detect but lead to a more significant decline in CTG performance.
  • Figure 3: Watermark detection: AUC scores under different $\delta$ settings. Higher AUC scores indicates a better detection performances.
  • Figure 4: The coverage of target tokens by semantically related tokens varies with different datasets and values of the hyperparameter $k$ on BART-base. Increasing the value of $k$ improves the coverage of semantic tokens, aligning with our objective and motivation.
  • Figure 5: The impact of $\gamma$ on DART results with settings of $\delta=2/5/10$. $\gamma$ controls the size of the green list. From $\delta=2$ to $\delta=5$, the watermarking method tends to change from a soft watermark to a hard watermark, and the probability of generating tokens from the green list gradually increases.
  • ...and 5 more figures