GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Jiayi Fu; Xuandong Zhao; Ruihan Yang; Yuansen Zhang; Jiangjie Chen; Yanghua Xiao

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Jiayi Fu, Xuandong Zhao, Ruihan Yang, Yuansen Zhang, Jiangjie Chen, Yanghua Xiao

TL;DR

This work addresses the determinism of decoding-based watermarks in large language models and introduces the Logits-Addition watermark to enable diversified generation. Among three diversification strategies, the GumbelSoft variant—a softmax-based version—achieves superior detectability and diversity, outperforming existing GM watermark variants and baselines on QA and completion tasks. The authors provide theoretical results for per-token scores and demonstrate robust performance under common attacks, while preserving text quality. The approach broadens the practicality of watermarking for LLMs by balancing detection strength, diversity, and robustness, with considerations for paraphrase attacks and downstream impact.

Abstract

Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty. Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts due to its notable detectability. However, GM watermark encounters a major challenge with generation diversity, always yielding identical outputs for the same prompt, negatively impacting generation diversity and user experience. To overcome this limitation, we propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity. Among these, the GumbelSoft watermark (a softmax variant of the Logits-Addition watermark) demonstrates superior performance in high diversity settings, with its AUROC score outperforming those of the two alternative variants by 0.1 to 0.3 and surpassing other decoding-based watermarking methods by a minimum of 0.1.

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

TL;DR

Abstract

Paper Structure (49 sections, 1 theorem, 20 equations, 7 figures, 5 tables, 2 algorithms)

This paper contains 49 sections, 1 theorem, 20 equations, 7 figures, 5 tables, 2 algorithms.

Introduction
Related Work
Zero-shot Methods.
Training-Based Methods.
Watermarking Techniques.
Method
Preliminaries
Decoding-Based Watermark Framework.
GumbelMax-trick.
Watermark Design
Unbiasedness.
Logits-Addition Watermark.
Limitations of the GM Watermark.
GumbelSoft Watermark
Experiment
...and 34 more sections

Key Result

Theorem 1

Consider a text $w_1, \ldots, w_T$ embedded with a watermark using the Logits-Addition technique. When evaluated by the Logits-Addition watermark detector, the expected value and variance of the score for each token are given by For a non-watermarked text $w_1, \ldots, w_T$, applying the Logits-Addition watermark detector, the expected value and variance for each per-token score are Here, $\gamm

Figures (7)

Figure 1: One significant limitation of GM watermark lies in their production of identical responses to the same queries. Such determinism can lead to user dissatisfaction, as individuals may become frustrated with LLM recommending the same outcomes for repeated prompts. This issue primarily stems from the deterministic nature of both the Pseudo-random function and the Decoder function. To address this concern, we propose three solutions: Solutions I and II aim to introduce variability into the Decoder function, whereas Solution III seeks to inject uncertainty into the Pseudo-random function.
Figure 2: GumbelMax-trick can be used in text watermarking via two different ways: Exponential and Logits-Addition watermark. Each watermark has three variants to enhance generation diversity. The red part denotes our contribution, and the softmax variant of the Logits-Addition watermark is our suggested GumbelSoft watermark.
Figure 3: General framework of decoding-based watermark. The Generator uses logits vector $l_t$ and watermark key $\xi_t$ to decode the next token $w_t$. The Detector, employing scorer $\phi$, assesses the correlation between watermark key $\xi_t$ and token $w_t$, then combines these per-token scores to determine watermark presence. Both Generator and Detector share the same pseudo-random function $F_{sk}$. The context for watermark key calculation can be the preceding $h$ tokens.
Figure 4: The figure shows how AUROC changes with Self-Bleu on the QA task. we use different colors to represent temperature and different marks to represent GumbelSoft and the softmax variant of Exponential watermarks. The AUROC is calculated for 100 detection tokens. Since the top-right outshines the bottom-left in performance, GumbelSoft is more effective than the softmax variant of Exponential.
Figure 5: Comparison of the robustness of decoding-based watermark on Completion task. Blue histograms indicate unattacked conditions and red histograms show attacked scenarios. The AUROC is calculated for 40 detection tokens, with GumbelSoft set at a 0.3 temperature. Exp, Dip, and GS refer to Exponential, Dipmark, and GumbelSoft, respectively. GumbelSoft and Exponential show higher robustness when facing the T5-span attack.
...and 2 more figures

Theorems & Definitions (1)

Theorem 1

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

TL;DR

Abstract

GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)