Necessary and Sufficient Watermark for Large Language Models

Yuki Takezawa; Ryoma Sato; Han Bao; Kenta Niwa; Makoto Yamada

Necessary and Sufficient Watermark for Large Language Models

Yuki Takezawa, Ryoma Sato, Han Bao, Kenta Niwa, Makoto Yamada

TL;DR

The paper addresses the problem of reliably distinguishing LLM-generated text from human text without compromising quality. It introduces NS-Watermark, a watermarking approach based on a minimum, sufficient constraint on green-word proportions that adapts to text length, formalized as a constrained optimization problem and solved via both naive and linear-time algorithms. Empirical results across machine translation and natural language generation show NS-Watermark achieves near-zero false negatives while substantially preserving or improving text quality, and it offers improved robustness to post-editing compared with prior Soft-Watermark methods. The work highlights a favorable quality-detectability trade-off and provides practical algorithms, with limitations in computational cost that suggest directions for efficiency improvements and potential extensions to undetectable watermarking. Key contributions include (i) deriving a minimal constraint for reliable LLM/text distinction, (ii) proposing linear-time NS-Watermark algorithms with length-aware behavior, (iii) demonstrating substantial BLEU and PPL gains over previous watermarking approaches, and (iv) showing robustness to post-editing attacks, making NS-Watermark a practically viable solution for watermarking in LLM applications.

Abstract

In recent years, large language models (LLMs) have achieved remarkable performances in various NLP tasks. They can generate texts that are indistinguishable from those written by humans. Such remarkable performance of LLMs increases their risk of being used for malicious purposes, such as generating fake news articles. Therefore, it is necessary to develop methods for distinguishing texts written by LLMs from those written by humans. Watermarking is one of the most powerful methods for achieving this. Although existing watermarking methods have successfully detected texts generated by LLMs, they significantly degrade the quality of the generated texts. In this study, we propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading the text quality. More specifically, we derive minimum constraints required to be imposed on the generated texts to distinguish whether LLMs or humans write the texts. Then, we formulate the NS-Watermark as a constrained optimization problem and propose an efficient algorithm to solve it. Through the experiments, we demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods and distinguish more accurately between texts written by LLMs and those written by humans. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.

Necessary and Sufficient Watermark for Large Language Models

TL;DR

Abstract

Paper Structure (25 sections, 3 theorems, 16 equations, 9 figures, 16 tables, 2 algorithms)

This paper contains 25 sections, 3 theorems, 16 equations, 9 figures, 16 tables, 2 algorithms.

Introduction
Background
Proposed Method
Necessary and Sufficient Watermark
Naive Algorithm for Necessary and Sufficient Watermark
Linear Time Algorithm for Necessary and Sufficient Watermark
Robustness to Post-editing Attack
Experiments
Comparison Methods
Machine Translation
Natural Language Generation
Robustness to Post-editing Attack
Related Work
Conclusion
Limitations
...and 10 more sections

Key Result

Theorem 1

If we select minimum $\delta^\star \in \mathbb{R}$ such that the z-score of the text generated by the Soft-Watermark exceeds the threshold $Z$, the Soft-Watermark generates a text that contains more than the required number of green words with non-zero probability.

Figures (9)

Figure 1: Visualization of the table ${\bm{T}}[t][g]$ for $T_{\text{max}}=200$, $\gamma=0.2$, $\widehat{T}=75$, $\alpha=2$, $Z=4$, and $G_{\text{max}}=63$. The areas colored in blue and light blue indicate the range in ${\bm{T}}[t][g]$ where we need to calculate, and the areas colored in blue indicate the range that satisfies the constraint of Eq. (\ref{['eq:ns_watermark']}). The red line indicates the minimum number of green words required to satisfy the constraint. Note that in the middle and right figures, ${\bm{T}}[t][G_{\text{max}}]$ does not denote the set of texts of length $t$ containing $G_{\text{max}}$ green words, but denotes the set of texts containing at least $G_{\text{max}}$ green words. See Sec. \ref{['sec:additional_visual_explation']} for figures with various $\gamma$.
Figure 2: Relationships between z-score and the length of generated texts. We used the validation datasets of WMT'16 En$\rightarrow$De. For each $\gamma$, we tuned the hyperparameter $\delta$ of the Soft-Watermark by increasing $4, 6, 8, \cdots$ and selecting the smallest value such that the FNR becomes less than $5\%$. We omit the results of the Soft-Watermark and Adaptive Soft-Watermark for $\gamma=0.0001$ because the z-scores become too large. Full results are deferred to Sec. \ref{['sec:visualization']}.
Figure 3: Time required to generate a text when varying $\alpha$.
Figure 4: Text quality when varying $\alpha$. We used the validation dataset of WMT'16 En$\rightarrow$De.
Figure 5: Trade-off between text quality and robustness against post-editing. To make the figure more readable, the results with FNR greater than $25\%$ were omitted. The method with the point in the lower left corner is the superior method. Surprisingly, the NS-Watermark is generally more robust against the post-editing than the Soft-Watermark even with a small offset $\beta=0.05$.
...and 4 more figures

Theorems & Definitions (5)

Theorem 1: Informal
Lemma 1
proof
Theorem 1: Formal
proof

Necessary and Sufficient Watermark for Large Language Models

TL;DR

Abstract

Necessary and Sufficient Watermark for Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (5)