WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness
Baizhou Huang, Xiaojun Wan
TL;DR
This paper tackles the persistent trade-offs among imperceptibility, efficacy, and robustness in watermarks for large language models. It introduces WaterPool, a semantics-based key module within a two-module, key-centered watermarking framework, to preserve full key sampling space while enabling precise key restoration via semantic search. By integrating WaterPool with existing marks (KGW, EXP, ITS), the approach achieves near-optimal imperceptibility and substantial gains in efficacy and robustness across open-ended generation and long-form QA on multiple model scales, and demonstrates scalability to large vector databases. The results suggest WaterPool is a practical and versatile plug-in that meaningfully strengthens watermarking performance in real-world deployments, while highlighting avenues for further refinement of mark modules and retrieval efficiency.
Abstract
With the increasing use of large language models (LLMs) in daily life, concerns have emerged regarding their potential misuse and societal impact. Watermarking is proposed to trace the usage of specific models by injecting patterns into their generated texts. An ideal watermark should produce outputs that are nearly indistinguishable from those of the original LLM (imperceptibility), while ensuring a high detection rate (efficacy), even when the text is partially altered (robustness). Despite many methods having been proposed, none have simultaneously achieved all three properties, revealing an inherent trade-off. This paper utilizes a key-centered scheme to unify existing watermarking techniques by decomposing a watermark into two distinct modules: a key module and a mark module. Through this decomposition, we demonstrate for the first time that the key module significantly contributes to the trade-off issues observed in prior methods. Specifically, this reflects the conflict between the scale of the key sampling space during generation and the complexity of key restoration during detection. To this end, we introduce \textbf{WaterPool}, a simple yet effective key module that preserves a complete key sampling space required by imperceptibility while utilizing semantics-based search to improve the key restoration process. WaterPool can integrate with most watermarks, acting as a plug-in. Our experiments with three well-known watermarking techniques show that WaterPool significantly enhances their performance, achieving near-optimal imperceptibility and markedly improving efficacy and robustness (+12.73\% for KGW, +20.27\% for EXP, +7.27\% for ITS).
