A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
Yihan Wu, Zhengmian Hu, Junfeng Guo, Hongyang Zhang, Heng Huang
TL;DR
This work presents DiPmark, a watermarking framework for large language models that preserves the original token distribution while enabling efficient, API-free detection and provable resilience to moderate text modifications. It achieves distribution preservation through a distribution-preserving reweighting scheme combined with i.i.d. cipher permutations, and a fixed detection statistic based on green-token bias with guaranteed false-positive control. The approach is empirically validated on MT/TS tasks, exhibits fast, scalable detection, and demonstrates robustness including a GPT-4 case study. These results offer a practical, theory-backed solution for watermarking AI-generated text suitable for industry deployment and detection workflows.
Abstract
Watermarking techniques offer a promising way to identify machine-generated content via embedding covert information into the contents generated from language models. A challenge in the domain lies in preserving the distribution of original generated content after watermarking. Our research extends and improves upon existing watermarking framework, placing emphasis on the importance of a \textbf{Di}stribution-\textbf{P}reserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking (distribution-preserving), is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens (resilient). DiPmark operates by selecting a random set of tokens prior to the generation of a word, then modifying the token distribution through a distribution-preserving reweight function to enhance the probability of these selected tokens during the sampling process. Extensive empirical evaluation on various language models and tasks demonstrates our approach's distribution-preserving property, accessibility, and resilience, making it a effective solution for watermarking tasks that demand impeccable quality preservation.
