Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions
Yihan Wu, Ruibo Chen, Zhengmian Hu, Yanshuo Chen, Junfeng Guo, Hongyang Zhang, Heng Huang
TL;DR
The paper analyzes distortion-free language-model watermarks under finite-key collisions, proving that strong distortion-free preservation cannot hold when key collisions occur. It formalizes the PDA-rule framework, demonstrates the inevitability of key collisions, and characterizes three levels of distortion-free behavior. To address distribution bias from collisions, it introduces the beta-watermark, a beta-weighted, model-agnostic extension of permutation-based reweighting that reduces bias while preserving detectability. Extensive experiments on MBart, BART-large, and LLaMA-2 across translation, summarization, and generation show beta-watermark can meaningfully lower distribution bias with controllable trade-offs in watermark strength and detection efficiency, suggesting a practical path forward for distortion-aware watermarking. The work highlights the need for novel key-space designs to approach true distortion-free guarantees under realistic constraints and motivates further study of robust, scalable watermarking and detection mechanisms.
Abstract
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation quality. However, one notable limitation of pseudo-random sampling compared to true-random sampling is that, under the same watermark keys (i.e., key collision), the results of pseudo-random sampling exhibit correlations. This limitation could potentially undermine the distortion-free property. Our studies reveal that key collisions are inevitable due to the limited availability of watermark keys, and existing distortion-free watermarks exhibit a significant distribution bias toward the original LM distribution in the presence of key collisions. Moreover, achieving a perfect distortion-free watermark is impossible as no statistical signal can be embedded under key collisions. To reduce the distribution bias caused by key collisions, we introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
