Table of Contents
Fetching ...

Towards Anytime-Valid Statistical Watermarking

Baihe Huang, Eric Xu, Kannan Ramchandran, Jiantao Jiao, Michael I. Jordan

TL;DR

The paper proposes Anchored E-Watermarking, an e-value–based framework that enables anytime-valid sequential detection of LLM-generated text by coupling with an anchor distribution $p_0$ and a robustness radius $\delta$. It derives the optimal one-step e-value $e^*$ and the worst-case log-growth rate $J^*$, yielding a stopping-time scaling of $\mathcal{O}(\log(1/\alpha)/J^*)$ and improving sample efficiency by $13$–$15\%$ over strong baselines. The authors prove that the fundamental limit on detection efficiency is achieved by the closed-form $e^*$ and demonstrate the approach on synthetic tests and real-world data (MarkMyWords) using Llama2-7B-chat with a Phi-3 anchor, achieving substantial token-budget reductions while preserving text quality. These results establish a principled, sequentially valid watermarking paradigm that is robust to adaptive challenges and offers practical gains for provenance auditing of language models.

Abstract

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.

Towards Anytime-Valid Statistical Watermarking

TL;DR

The paper proposes Anchored E-Watermarking, an e-value–based framework that enables anytime-valid sequential detection of LLM-generated text by coupling with an anchor distribution and a robustness radius . It derives the optimal one-step e-value and the worst-case log-growth rate , yielding a stopping-time scaling of and improving sample efficiency by over strong baselines. The authors prove that the fundamental limit on detection efficiency is achieved by the closed-form and demonstrate the approach on synthetic tests and real-world data (MarkMyWords) using Llama2-7B-chat with a Phi-3 anchor, achieving substantial token-budget reductions while preserving text quality. These results establish a principled, sequentially valid watermarking paradigm that is robust to adaptive challenges and offers practical gains for provenance auditing of language models.

Abstract

The proliferation of Large Language Models (LLMs) necessitates efficient mechanisms to distinguish machine-generated content from human text. While statistical watermarking has emerged as a promising solution, existing methods suffer from two critical limitations: the lack of a principled approach for selecting sampling distributions and the reliance on fixed-horizon hypothesis testing, which precludes valid early stopping. In this paper, we bridge this gap by developing the first e-value-based watermarking framework, Anchored E-Watermarking, that unifies optimal sampling with anytime-valid inference. Unlike traditional approaches where optional stopping invalidates Type-I error guarantees, our framework enables valid, anytime-inference by constructing a test supermartingale for the detection process. By leveraging an anchor distribution to approximate the target model, we characterize the optimal e-value with respect to the worst-case log-growth rate and derive the optimal expected stopping time. Our theoretical claims are substantiated by simulations and evaluations on established benchmarks, showing that our framework can significantly enhance sample efficiency, reducing the average token budget required for detection by 13-15% relative to state-of-the-art baselines.
Paper Structure (35 sections, 14 theorems, 238 equations, 2 figures, 3 tables)

This paper contains 35 sections, 14 theorems, 238 equations, 2 figures, 3 tables.

Key Result

Theorem 1.1

The optimal worst-case log-growth rate $\mathbb{E}_{\mathbf{H}_1}[\log E]$ under the alternative $\mathbf{H}_1$ is given by: where $h = H(p_0)$ is the Shannon entropy of the anchor distribution $p_0$ and $\delta>0$ is a robustness tolerance parameter. Furthermore, the optimal expected stopping time to achieve a Type-I error $\alpha$ scales as $\frac{\log(1/\alpha)}{J^*}$.

Figures (2)

  • Figure 1: Simulation of the two-token case for the log growth problem in Eq. \ref{['eq:log-growth']}. Three separate anchor distributions are used each with parameter $\delta = 0.01$. We solve the simplified maxmin problem using the CLARABEL interior point method solver which is run for $30$ steps. The theoretical optimum is computed as in Eq. \ref{['eq:optimal-log-growth']}.
  • Figure 2: Simulation of the two-token case for the stopping time problem $\text{SC}(e^*)$. We simulate for three different anchor distributions $p_0 = p1-p$ with $\delta = 0.1$ and estimate average stopping times $\mathbb{E}[\tau_{\alpha}]$ for $\alpha$ values ranging from $10^{-2}$ to $10^{-120}$. Each $\mathbb{E}[\tau_{\alpha}]$ is estimated by simulating $10000$ stopping times $\tau_\alpha$. The plots above display graphs of $\frac{\mathbb{E}[\tau_\alpha]}{\log(1/\alpha)}$ with red dashed lines equal to $1/J^*$ where $J^*$ is as in Eq. \ref{['eq:optimal-log-growth']}. We observe that convergence to the theoretical optimum is obtained for sufficiently small $\alpha$.

Theorems & Definitions (32)

  • Theorem 1.1: Informal version of \ref{['thm:log-growth']} and \ref{['thm:stopping-time']}
  • Definition 2.1: Distortion-free
  • Definition 2.2: Model-agnosticity
  • Definition 2.3: E-value
  • Theorem 2.4: Ville's inequality
  • Remark 3.2: Relationship to growth rate optimality in the worst case (GROW) grunwald2020safe
  • Theorem 4.1: Log-growth rate
  • Remark 4.2: Relationship to SEAL huang2025watermarking
  • Theorem 4.3: Expected stopping time
  • Remark 4.4: Relationship with the rates in huang2023towards
  • ...and 22 more