DMark: Order-Agnostic Watermarking for Diffusion Large Language Models
Linyu Wu, Linhao Zhong, Wenjie Qu, Yuexin Li, Yue Liu, Shengfang Zhai, Chunhua Shen, Jiaheng Zhang
TL;DR
DMark tackles the challenge of watermarking diffusion-based LLMs, whose non-sequential token finalization invalidates traditional AR watermarks. It introduces three strategies—Predictive, Bidirectional, and Predictive-Bidirectional—leveraging parallel logit predictions and bidirectional context to embed detectable watermarks across arbitrary generation orders. Empirical results show strong detection rates at a 1% false-positive rate (92.0-99.5%), outperforming naive adaptations (49.6-71.2%), with robust performance under common text manipulations. The work provides both practical watermarking tools for dLLMs and theoretical groundwork for watermarking non-sequential generative models, with guidance on parameter settings and implementation details for reproducibility.
Abstract
Diffusion large language models (dLLMs) offer faster generation than autoregressive models while maintaining comparable quality, but existing watermarking methods fail on them due to their non-sequential decoding. Unlike autoregressive models that generate tokens left-to-right, dLLMs can finalize tokens in arbitrary order, breaking the causal design underlying traditional watermarks. We present DMark, the first watermarking framework designed specifically for dLLMs. DMark introduces three complementary strategies to restore watermark detectability: predictive watermarking uses model-predicted tokens when actual context is unavailable; bidirectional watermarking exploits both forward and backward dependencies unique to diffusion decoding; and predictive-bidirectional watermarking combines both approaches to maximize detection strength. Experiments across multiple dLLMs show that DMark achieves 92.0-99.5% detection rates at 1% false positive rate while maintaining text quality, compared to only 49.6-71.2% for naive adaptations of existing methods. DMark also demonstrates robustness against text manipulations, establishing that effective watermarking is feasible for non-autoregressive language models.
