Table of Contents
Fetching ...

Adaptive Testing for Segmenting Watermarked Texts From Language Models

Xingchi Li, Xiaochi Liu, Guanxun Li

TL;DR

The paper tackles the challenge of distinguishing watermarked from non-watermarked text generated by large language models (LLMs) and, more ambitiously, segmenting mixed-content texts into watermarked and non-watermarked substrings. It generalizes likelihood-based watermark detection to adaptive test statistics for EMS and ITS and develops a randomization-based framework that does not require exact prompt estimation. A change-point detection approach with moving windows and robust statistical tests enables accurate substring segmentation, validated on multiple LLMs and watermark schemes. The findings show that adaptive statistics outperform baselines in identifying watermarked segments and are robust to edits, with practical implications for watermarking in real-world, mixed-content scenarios.

Abstract

The rapid adoption of large language models (LLMs), such as GPT-4 and Claude 3.5, underscores the need to distinguish LLM-generated text from human-written content to mitigate the spread of misinformation and misuse in education. One promising approach to address this issue is the watermark technique, which embeds subtle statistical signals into LLM-generated text to enable reliable identification. In this paper, we first generalize the likelihood-based LLM detection method of a previous study by introducing a flexible weighted formulation, and further adapt this approach to the inverse transform sampling method. Moving beyond watermark detection, we extend this adaptive detection strategy to tackle the more challenging problem of segmenting a given text into watermarked and non-watermarked substrings. In contrast to the approach in a previous study, which relies on accurate estimation of next-token probabilities that are highly sensitive to prompt estimation, our proposed framework removes the need for precise prompt estimation. Extensive numerical experiments demonstrate that the proposed methodology is both effective and robust in accurately segmenting texts containing a mixture of watermarked and non-watermarked content.

Adaptive Testing for Segmenting Watermarked Texts From Language Models

TL;DR

The paper tackles the challenge of distinguishing watermarked from non-watermarked text generated by large language models (LLMs) and, more ambitiously, segmenting mixed-content texts into watermarked and non-watermarked substrings. It generalizes likelihood-based watermark detection to adaptive test statistics for EMS and ITS and develops a randomization-based framework that does not require exact prompt estimation. A change-point detection approach with moving windows and robust statistical tests enables accurate substring segmentation, validated on multiple LLMs and watermark schemes. The findings show that adaptive statistics outperform baselines in identifying watermarked segments and are robust to edits, with practical implications for watermarking in real-world, mixed-content scenarios.

Abstract

The rapid adoption of large language models (LLMs), such as GPT-4 and Claude 3.5, underscores the need to distinguish LLM-generated text from human-written content to mitigate the spread of misinformation and misuse in education. One promising approach to address this issue is the watermark technique, which embeds subtle statistical signals into LLM-generated text to enable reliable identification. In this paper, we first generalize the likelihood-based LLM detection method of a previous study by introducing a flexible weighted formulation, and further adapt this approach to the inverse transform sampling method. Moving beyond watermark detection, we extend this adaptive detection strategy to tackle the more challenging problem of segmenting a given text into watermarked and non-watermarked substrings. In contrast to the approach in a previous study, which relies on accurate estimation of next-token probabilities that are highly sensitive to prompt estimation, our proposed framework removes the need for precise prompt estimation. Extensive numerical experiments demonstrate that the proposed methodology is both effective and robust in accurately segmenting texts containing a mixture of watermarked and non-watermarked content.

Paper Structure

This paper contains 27 sections, 6 theorems, 60 equations, 18 figures, 1 table, 1 algorithm.

Key Result

Proposition 1

(Theorem 1 of li2025likelihood) For the randomization test, the following holds:

Figures (18)

  • Figure 1: Illustration of the roles of the LLM provider, user and detector.
  • Figure 2: Boxplots of the Rand index comparing clusters identified through detected change points with the true clusters defined by the true change points, for different thresholds in the EMS framework using the Llama LLM.
  • Figure 3: Comparison of $p$-value sequences generated by different methods under a fixed prompt. The two adaptive methods produce higher-quality $p$-value sequences than the baseline, explaining their superior performance in change-point detection. The empty method performs comparably to the optim method, indicating that prompt estimation is unnecessary in our framework.
  • Figure B.1: $p$-value sequence calculated using watermarked texts generated from the Llama LLM with the EMS method.
  • Figure B.2: Boxplots of the Rand index comparing clusters identified through detected change points with the true clusters defined by the true change points, for different thresholds in the EMS framework using the Mistral LLM.
  • ...and 13 more figures

Theorems & Definitions (15)

  • Example 1: EMS
  • Example 2: ITS
  • Proposition 1
  • Corollary 1
  • Proposition 2
  • Corollary 2
  • Theorem 1
  • Lemma 1
  • Remark
  • proof : Proof of Proposition \ref{['thm:error-control']}
  • ...and 5 more