HeavyWater and SimplexWater: Distortion-Free LLM Watermarks for Low-Entropy Next-Token Predictions
Dor Tsur, Carol Xuan Long, Claudio Mayrink Verdun, Hsiang Hsu, Chen-Fu Chen, Haim Permuter, Sajani Vithana, Flavio P. Calmon
TL;DR
This work formulates a principled minimax framework for embedding watermarks in LLM outputs, aiming to maximize detectability while preserving text quality, especially in low-entropy generations. It introduces SimplexWater, a binary-score watermark tied to distance-maximizing codes and proven minimax-optimal, and HeavyWater, which uses heavy-tailed score distributions to boost detection; both are distortion-free via optimal transport and adaptable to any side-information scheme. Theoretical analysis connects watermarks to coding theory and shows Gumbel watermark as a special case within the OT framework, while empirical results demonstrate superior detection with minimal distortion across coding and QA benchmarks and multiple open-weight models. The work also discusses randomness efficiency and practical considerations like hashing schemes, robustness, and computational overhead, offering a path toward robust, verifiable watermarking for AI-generated text. Overall, HeavyWater and SimplexWater advance watermarking design by exploiting tail behavior and coding-theory insights, enabling reliable authentication of machine-generated content with controlled textual impact.
Abstract
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems. Current watermarks operate by changing the next-token predictions output by an LLM. The updated (i.e., watermarked) predictions depend on random side information produced, for example, by hashing previously generated tokens. LLM watermarking is particularly challenging in low-entropy generation tasks -- such as coding -- where next-token predictions are near-deterministic. In this paper, we propose an optimization framework for watermark design. Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text. Our analysis informs the design of two new watermarks: HeavyWater and SimplexWater. Both watermarks are tunable, gracefully trading-off between detection accuracy and text distortion. They can also be applied to any LLM and are agnostic to side information generation. We examine the performance of HeavyWater and SimplexWater through several benchmarks, demonstrating that they can achieve high watermark detection accuracy with minimal compromise of text generation quality, particularly in the low-entropy regime. Our theoretical analysis also reveals surprising new connections between LLM watermarking and coding theory. The code implementation can be found in https://github.com/DorTsur/HeavyWater_SimplexWater
