Towards Optimal Statistical Watermarking

Baihe Huang; Hanlin Zhu; Banghua Zhu; Kannan Ramchandran; Michael I. Jordan; Jason D. Lee; Jiantao Jiao

Towards Optimal Statistical Watermarking

Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

TL;DR

The paper reframes statistical watermarking for text generation as a hypothesis-testing problem with a random rejection region tied to a secret watermark key, allowing precise control of Type I and Type II errors. It proves that the Uniformly Most Powerful watermark can be achieved via pseudo-random approximations of the output distribution and clipping, and it derives minimax, model-agnostic guarantees with explicit rates. In the i.i.d. token setting, it establishes the scaling $n_{\mathrm{ump}}(h,\alpha,\beta)=\Theta\left(\frac{\ln(1/h)(\ln(1/\alpha)\wedge\ln(1/\beta))}{h}\right)$ and $n_{\mathrm{minmax}}(h,\alpha,\beta)=\Theta\left(\frac{\ln(1/h)}{h}(\ln(1/\alpha)+\ln(1/\beta))\right)$, marking a significant improvement over prior $h^{-2}$ rates. The authors extend the framework to robust watermarking via a perturbation graph and LP-based optimization, and validate the theory with experiments on benchmark data, showing practical detectability with fewer tokens. Overall, the work provides a unified, information-theoretic foundation for evaluating and designing watermarking schemes with near-optimal guarantees and robustness considerations.

Abstract

We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $Θ(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

Towards Optimal Statistical Watermarking

TL;DR

and

, marking a significant improvement over prior

rates. The authors extend the framework to robust watermarking via a perturbation graph and LP-based optimization, and validate the theory with experiments on benchmark data, showing practical detectability with fewer tokens. Overall, the work provides a unified, information-theoretic foundation for evaluating and designing watermarking schemes with near-optimal guarantees and robustness considerations.

Abstract

tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of

with respect to the average entropy per token

highlights potentials for improvement from the rate of

in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

Paper Structure (18 sections, 9 theorems, 67 equations, 1 figure, 1 table)

This paper contains 18 sections, 9 theorems, 67 equations, 1 figure, 1 table.

Introduction
Related works
Notation
Watermarking as a Hypothesis Testing Problem
Examples
Statistical Limit in Watermarking
Rates under the general setting of Problem \ref{['prob:crypto_stat_watermark']}
Rates of model-agnostic watermarking
Rates in the i.i.d. setting
Robust Watermarking
Experiments
Conclusions
Proof of \ref{['thm:crypto_stat_rates']}
Proof of \ref{['thm:iid_rates']}
Supporting lemmata
...and 3 more sections

Key Result

Theorem 3.2

For probability measure $\rho$, the Uniformly Most Powerful $\epsilon$-distorted watermark of level $\alpha$, denoted by $\mathcal{P}^*$, is given by where $\rho^* = \arg \min_{\texttt{TV}(\rho'\|\rho) \leq \epsilon} \sum_{x \in \Omega: \rho'(x) > \alpha} \left(\rho'(x) - \alpha\right).$ Its Type II error is given by and when $|\Omega| \geq \frac{1}{\alpha}$ it simplifies to

Figures (1)

Figure 1: Illustration of watermarking in practice.

Theorems & Definitions (37)

Remark 2.2: Difference between classical hypothesis testing
Remark 2.3: Implementation
Remark 2.5: Information of the model
Example 2.6: Text Generation with Soft Red List, kirchenbauer2023watermark
Example 2.7: Complete watermarking algorithm $\mathrm{Wak}_{\mathrm{sk}}$, christ2023undetectable
Example 2.8: Inverse transform sampling $\mathrm{Wak}_{\mathrm{ITS}}$, kuditipudi2023robust
Definition 3.1: Uniformly Most Powerful Watermark
Theorem 3.2
Remark 3.3: Dependence on distortion parameter $\epsilon$
Remark 3.4: Intuition behind $\mathcal{P}^*$
...and 27 more

Towards Optimal Statistical Watermarking

TL;DR

Abstract

Towards Optimal Statistical Watermarking

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (37)