Watermark under Fire: A Robustness Evaluation of LLM Watermarking

Jiacheng Liang; Zian Wang; Lauren Hong; Shouling Ji; Ting Wang

Watermark under Fire: A Robustness Evaluation of LLM Watermarking

Jiacheng Liang, Zian Wang, Lauren Hong, Shouling Ji, Ting Wang

TL;DR

This work tackles the problem of evaluating robustness in LLM watermarking by introducing WaterPark, an open-source platform that unifies 12 watermarkers, 12 watermark-removal attacks, and 8 evaluation metrics. It conducts comprehensive observational and controlled analyses across multiple LLMs and task domains, revealing how design choices—such as context-dependency and generation strategy—drive robustness and fidelity trade-offs. The findings show that context-free, distribution-transform watermarks often offer stronger attack resilience at the cost of fidelity, while text-dependent, soft-perturbation methods preserve quality but are more attack-prone; combining detectors and utilizing surrogate attacks further illuminate practical defense and attack dynamics. The work provides actionable deployment guidelines and a shared benchmark to advance robust watermarking research in real-world adversarial settings.

Abstract

Various watermarking methods (``watermarkers'') have been proposed to identify LLM-generated texts; yet, due to the lack of unified evaluation platforms, many critical questions remain under-explored: i) What are the strengths/limitations of various watermarkers, especially their attack robustness? ii) How do various design choices impact their robustness? iii) How to optimally operate watermarkers in adversarial environments? To fill this gap, we systematize existing LLM watermarkers and watermark removal attacks, mapping out their design spaces. We then develop WaterPark, a unified platform that integrates 10 state-of-the-art watermarkers and 12 representative attacks. More importantly, by leveraging WaterPark, we conduct a comprehensive assessment of existing watermarkers, unveiling the impact of various design choices on their attack robustness. We further explore the best practices to operate watermarkers in adversarial environments. We believe our study sheds light on current LLM watermarking techniques while WaterPark serves as a valuable testbed to facilitate future research.

Watermark under Fire: A Robustness Evaluation of LLM Watermarking

TL;DR

Abstract

Watermark under Fire: A Robustness Evaluation of LLM Watermarking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)