Table of Contents
Fetching ...

Towards Optimal Statistical Watermarking

Baihe Huang, Hanlin Zhu, Banghua Zhu, Kannan Ramchandran, Michael I. Jordan, Jason D. Lee, Jiantao Jiao

TL;DR

The paper reframes statistical watermarking for text generation as a hypothesis-testing problem with a random rejection region tied to a secret watermark key, allowing precise control of Type I and Type II errors. It proves that the Uniformly Most Powerful watermark can be achieved via pseudo-random approximations of the output distribution and clipping, and it derives minimax, model-agnostic guarantees with explicit rates. In the i.i.d. token setting, it establishes the scaling $n_{\mathrm{ump}}(h,\alpha,\beta)=\Theta\left(\frac{\ln(1/h)(\ln(1/\alpha)\wedge\ln(1/\beta))}{h}\right)$ and $n_{\mathrm{minmax}}(h,\alpha,\beta)=\Theta\left(\frac{\ln(1/h)}{h}(\ln(1/\alpha)+\ln(1/\beta))\right)$, marking a significant improvement over prior $h^{-2}$ rates. The authors extend the framework to robust watermarking via a perturbation graph and LP-based optimization, and validate the theory with experiments on benchmark data, showing practical detectability with fewer tokens. Overall, the work provides a unified, information-theoretic foundation for evaluating and designing watermarking schemes with near-optimal guarantees and robustness considerations.

Abstract

We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of $n$ tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of $Θ(h^{-1} \log (1/h))$ with respect to the average entropy per token $h$ highlights potentials for improvement from the rate of $h^{-2}$ in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.

Towards Optimal Statistical Watermarking

TL;DR

The paper reframes statistical watermarking for text generation as a hypothesis-testing problem with a random rejection region tied to a secret watermark key, allowing precise control of Type I and Type II errors. It proves that the Uniformly Most Powerful watermark can be achieved via pseudo-random approximations of the output distribution and clipping, and it derives minimax, model-agnostic guarantees with explicit rates. In the i.i.d. token setting, it establishes the scaling and , marking a significant improvement over prior rates. The authors extend the framework to robust watermarking via a perturbation graph and LP-based optimization, and validate the theory with experiments on benchmark data, showing practical detectability with fewer tokens. Overall, the work provides a unified, information-theoretic foundation for evaluating and designing watermarking schemes with near-optimal guarantees and robustness considerations.

Abstract

We study statistical watermarking by formulating it as a hypothesis testing problem, a general framework which subsumes all previous statistical watermarking methods. Key to our formulation is a coupling of the output tokens and the rejection region, realized by pseudo-random generators in practice, that allows non-trivial trade-offs between the Type I error and Type II error. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting and the minimax Type II error in the model-agnostic setting. In the common scenario where the output is a sequence of tokens, we establish nearly matching upper and lower bounds on the number of i.i.d. tokens required to guarantee small Type I and Type II errors. Our rate of with respect to the average entropy per token highlights potentials for improvement from the rate of in the previous works. Moreover, we formulate the robust watermarking problem where the user is allowed to perform a class of perturbations on the generated texts, and characterize the optimal Type II error of robust UMP tests via a linear programming problem. To the best of our knowledge, this is the first systematic statistical treatment on the watermarking problem with near-optimal rates in the i.i.d. setting, which might be of interest for future works.
Paper Structure (18 sections, 9 theorems, 67 equations, 1 figure, 1 table)

This paper contains 18 sections, 9 theorems, 67 equations, 1 figure, 1 table.

Key Result

Theorem 3.2

For probability measure $\rho$, the Uniformly Most Powerful $\epsilon$-distorted watermark of level $\alpha$, denoted by $\mathcal{P}^*$, is given by where $\rho^* = \arg \min_{\texttt{TV}(\rho'\|\rho) \leq \epsilon} \sum_{x \in \Omega: \rho'(x) > \alpha} \left(\rho'(x) - \alpha\right).$ Its Type II error is given by and when $|\Omega| \geq \frac{1}{\alpha}$ it simplifies to

Figures (1)

  • Figure 1: Illustration of watermarking in practice.

Theorems & Definitions (37)

  • Remark 2.2: Difference between classical hypothesis testing
  • Remark 2.3: Implementation
  • Remark 2.5: Information of the model
  • Example 2.6: Text Generation with Soft Red List, kirchenbauer2023watermark
  • Example 2.7: Complete watermarking algorithm $\mathrm{Wak}_{\mathrm{sk}}$, christ2023undetectable
  • Example 2.8: Inverse transform sampling $\mathrm{Wak}_{\mathrm{ITS}}$, kuditipudi2023robust
  • Definition 3.1: Uniformly Most Powerful Watermark
  • Theorem 3.2
  • Remark 3.3: Dependence on distortion parameter $\epsilon$
  • Remark 3.4: Intuition behind $\mathcal{P}^*$
  • ...and 27 more