Table of Contents
Fetching ...

Online LLM watermark detection via e-processes

Weijie Su, Ruodu Wang, Zinan Zhao

TL;DR

This work addresses the challenge of detecting LLM-generated text in streaming settings by framing watermark detection as sequential hypothesis testing with anytime-valid guarantees. It develops a unified framework based on e-values and e-processes, enabling continuous monitoring and Type I error control under arbitrary stopping times, exemplified by the Gumbel-max watermark. The authors introduce empirically adaptive e-processes, including weight-adaptive calibrators, online Grenander e-processes, and average e-processes, and establish asymptotic power guarantees under suitable assumptions. Empirical results on open-source LLMs show that e-process-based methods achieve robust sequential validity and competitive power, particularly in scenarios with degenerate next-token distributions where traditional sum-based methods falter. Overall, the approach provides a principled, theoretically grounded, and practically effective toolkit for real-time watermark detection in AI-generated text.

Abstract

Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.

Online LLM watermark detection via e-processes

TL;DR

This work addresses the challenge of detecting LLM-generated text in streaming settings by framing watermark detection as sequential hypothesis testing with anytime-valid guarantees. It develops a unified framework based on e-values and e-processes, enabling continuous monitoring and Type I error control under arbitrary stopping times, exemplified by the Gumbel-max watermark. The authors introduce empirically adaptive e-processes, including weight-adaptive calibrators, online Grenander e-processes, and average e-processes, and establish asymptotic power guarantees under suitable assumptions. Empirical results on open-source LLMs show that e-process-based methods achieve robust sequential validity and competitive power, particularly in scenarios with degenerate next-token distributions where traditional sum-based methods falter. Overall, the approach provides a principled, theoretically grounded, and practically effective toolkit for real-time watermark detection in AI-generated text.

Abstract

Watermarking for large language models (LLMs) has emerged as an effective tool for distinguishing AI-generated text from human-written content. Statistically, watermark schemes induce dependence between generated tokens and a pseudo-random sequence, reducing watermark detection to a hypothesis testing problem on independence. We develop a unified framework for LLM watermark detection based on e-processes, providing anytime-valid guarantees for online testing. We propose various methods to construct empirically adaptive e-processes that can enhance the detection power. In addition, theoretical results are established to characterize the power properties of the proposed procedures. Some experiments demonstrate that the proposed framework achieves competitive performance compared to existing watermark detection methods.
Paper Structure (17 sections, 6 theorems, 63 equations, 2 figures, 1 algorithm)

This paper contains 17 sections, 6 theorems, 63 equations, 2 figures, 1 algorithm.

Key Result

Proposition 1

For any watermarking scheme $S$ and NTP vector $\mathbf{P}\in \Delta_K$, always holds true. In particular, $\mathbb{Q}^\mathbf{P}\ll \mathbb{U}^\mathbf{P}$.

Figures (2)

  • Figure 1: Average Type II error (on a log scale), Type I error and Sequential Type I error rates versus text length on the simulated data with $\delta=0.2$ (top), and $\delta=0.5$ (bottom).
  • Figure 2: Average Type II error (on a log scale), Type I error and Sequential Type I error rates versus text length on the OPT-1.3B model with temperature parameters at 0.5 (top), and 1 (bottom).

Theorems & Definitions (13)

  • Proposition 1
  • proof
  • Lemma 1
  • proof
  • Theorem 1
  • proof
  • Remark 1
  • Theorem 2
  • proof
  • Lemma 2
  • ...and 3 more