Table of Contents
Fetching ...

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

Baizhou Huang, Xiaojun Wan

TL;DR

This paper tackles the persistent trade-offs among imperceptibility, efficacy, and robustness in watermarks for large language models. It introduces WaterPool, a semantics-based key module within a two-module, key-centered watermarking framework, to preserve full key sampling space while enabling precise key restoration via semantic search. By integrating WaterPool with existing marks (KGW, EXP, ITS), the approach achieves near-optimal imperceptibility and substantial gains in efficacy and robustness across open-ended generation and long-form QA on multiple model scales, and demonstrates scalability to large vector databases. The results suggest WaterPool is a practical and versatile plug-in that meaningfully strengthens watermarking performance in real-world deployments, while highlighting avenues for further refinement of mark modules and retrieval efficiency.

Abstract

With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).

WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness

TL;DR

This paper tackles the persistent trade-offs among imperceptibility, efficacy, and robustness in watermarks for large language models. It introduces WaterPool, a semantics-based key module within a two-module, key-centered watermarking framework, to preserve full key sampling space while enabling precise key restoration via semantic search. By integrating WaterPool with existing marks (KGW, EXP, ITS), the approach achieves near-optimal imperceptibility and substantial gains in efficacy and robustness across open-ended generation and long-form QA on multiple model scales, and demonstrates scalability to large vector databases. The results suggest WaterPool is a practical and versatile plug-in that meaningfully strengthens watermarking performance in real-world deployments, while highlighting avenues for further refinement of mark modules and retrieval efficiency.

Abstract

With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).
Paper Structure (39 sections, 8 theorems, 28 equations, 5 figures, 11 tables, 6 algorithms)

This paper contains 39 sections, 8 theorems, 28 equations, 5 figures, 11 tables, 6 algorithms.

Key Result

Proposition 3.1

A watermark is imperceptible if (1) Independent condition: the sampled private key vectors for each generated output are mutually independent, i.e. ${\mathbf{k}}^1,...,{\mathbf{k}}^N\overset{i.i.d}{\sim}\mathcal{U}(\mathbb{R}^L)$$L$ is the maximum output length of LLM.; (2) Unbiased condition: the

Figures (5)

  • Figure 1: Previous methods made trade-offs among imperceptibility, efficacy and robustness. WaterPool mitigates this problem and improve KGW, ITS, EXP significantly.
  • Figure 2: Overview of key-centered watermarking scheme. A watermark is decomposed into two modules, a key module and a mark module. (a) During generation, the LLM provides an next token distribution $P_M$. The key module samples a private key ${\mathbf{k}}_i$ as a random seed for the mark module to stochastically modify the distribution to $\hat{P}_M$, from which watermarked texts are sampled. (b) During detection, the key module restores the key $\hat{{\mathbf{k}}}_i$ for each candidate token. The mark module then calculates the per-token statistic $s_i$ based on the restored key and aggregates them for $p$-value.
  • Figure 3: TPR@FPR=1% of WaterPool with different database size on open-ended generation.
  • Figure 4: TPR@FPR=1% of different watermarking techniques with the growths of text length. The same color indicates different methods sharing the same mark module. Solid lines represent original methods while dashed lines represent WaterPool methods.
  • Figure 5: TPR@FPR=1% of WaterPool with different non-watermarked texts in form of . The first column lists watermarking methods, and the second column shows non-watermarked text sources. WaterPool remains stable across different non-watermarked texts.

Theorems & Definitions (14)

  • Proposition 3.1
  • Proposition 3.2
  • Proposition 3.3
  • proof
  • Lemma B.1
  • proof
  • Lemma B.2
  • proof
  • Lemma B.3
  • proof
  • ...and 4 more