Table of Contents
Fetching ...

Watermarking Discrete Diffusion Language Models

Avi Bagchi, Akhil Bhimaraju, Moulik Choraria, Daniel Alabi, Lav R. Varshney

TL;DR

This work addresses the need to watermark discrete diffusion language models to ensure authenticity and traceability. It introduces a distribution-preserving Gumbel-max embedding at every diffusion step with sequence-index seeding for reliable detection, and proves distortion-freeness along with an exponential decay in false-detection probability as the token sequence length grows. Empirically, the method demonstrated reliable detectability on state-of-the-art discrete diffusion models like LLaDA while preserving perplexity and benchmark performance, unlike prior green-list approaches. The results establish a practical, theoretically grounded approach to watermarking discrete diffusion language models and point to future work on broader model support and robustness enhancements.

Abstract

Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, none address discrete diffusion language models, which are becoming popular due to their high inference throughput. In this paper, we introduce the first watermarking method for discrete diffusion models by applying the distribution-preserving Gumbel-max trick at every diffusion step and seeding the randomness with the sequence index to enable reliable detection. We experimentally demonstrate that our scheme is reliably detectable on state-of-the-art diffusion language models and analytically prove that it is distortion-free with an exponentially decaying probability of false detection in the token sequence length.

Watermarking Discrete Diffusion Language Models

TL;DR

This work addresses the need to watermark discrete diffusion language models to ensure authenticity and traceability. It introduces a distribution-preserving Gumbel-max embedding at every diffusion step with sequence-index seeding for reliable detection, and proves distortion-freeness along with an exponential decay in false-detection probability as the token sequence length grows. Empirically, the method demonstrated reliable detectability on state-of-the-art discrete diffusion models like LLaDA while preserving perplexity and benchmark performance, unlike prior green-list approaches. The results establish a practical, theoretically grounded approach to watermarking discrete diffusion language models and point to future work on broader model support and robustness enhancements.

Abstract

Watermarking has emerged as a promising technique to track AI-generated content and differentiate it from authentic human creations. While prior work extensively studies watermarking for autoregressive large language models (LLMs) and image diffusion models, none address discrete diffusion language models, which are becoming popular due to their high inference throughput. In this paper, we introduce the first watermarking method for discrete diffusion models by applying the distribution-preserving Gumbel-max trick at every diffusion step and seeding the randomness with the sequence index to enable reliable detection. We experimentally demonstrate that our scheme is reliably detectable on state-of-the-art diffusion language models and analytically prove that it is distortion-free with an exponentially decaying probability of false detection in the token sequence length.

Paper Structure

This paper contains 20 sections, 2 theorems, 15 equations, 11 figures, 6 tables, 5 algorithms.

Key Result

Theorem 1

Given a diffusion language model $p_\theta$, the output text of Alg. alg:algorithm has the same distribution eq:prob_at_t as that of the unwatermarked language model $p_\theta$ if the effects of the pseudorandom seed are negligible.

Figures (11)

  • Figure 1: Distribution of normalized detection scores for unwatermarked as compared to watermarked text using our Gumbel-max scheme. We use 500 open-ended prompts.
  • Figure 2: Percentage of open-ended prompts that exceed threshold $\tau$, for different values of $\tau$. We show results for unwatermarked and watermarked text, illustrating the tradeoff between soundness and completeness.
  • Figure 3: The forward process converges to an all masked state. LLaDA predicts the entire sequence at each step and then re-masks the $t|V|$ tokens with lowest confidence. Only $M \to v \neq M$ transitions occur in sampling.
  • Figure 4: The forward process in absorbing (left) converges to an all masked state, while uniform (right) converges to a uniform distribution over $\mathcal{V}$. In sampling, the absorbing case only permits $M \to v \neq M$ transitions while the uniform case can transition freely.
  • Figure 5: (SEDD Absorb) z-score vs $t_{end}$ for $\delta=10$
  • ...and 6 more figures

Theorems & Definitions (4)

  • Theorem 1
  • proof
  • Theorem 2
  • proof