GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick
Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao
TL;DR
This work addresses the determinism of decoding-based watermarks in large language models and introduces the Logits-Addition watermark to enable diversified generation. Among three diversification strategies, the GumbelSoft variant—a softmax-based version—achieves superior detectability and diversity, outperforming existing GM watermark variants and baselines on QA and completion tasks. The authors provide theoretical results for per-token scores and demonstrate robust performance under common attacks, while preserving text quality. The approach broadens the practicality of watermarking for LLMs by balancing detection strength, diversity, and robustness, with considerations for paraphrase attacks and downstream impact.
Abstract
Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.
