Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation
Zhongjie Ba, Yitao Zhang, Peng Cheng, Bin Gong, Xinyu Zhang, Qinglong Wang, Kui Ren
TL;DR
This paper reveals a robustness–stealthiness paradox in watermarking for AI-generated content: making watermarks more robust against distortions increases watermark information leakage. It introduces DAPAO, a no-box attack that uses multi-channel features from a pre-trained vision model to extract watermark leakage from a single watermarked image, enabling both detection evasion and semantic forgery with high success rates. Empirical evaluation across multiple datasets and watermarking methods shows substantial gains in evasion (up to ~87% in some cases) and forgery (up to ~85%) while preserving visual quality, underscoring security risks in current robust watermark designs. The work advocates defenses such as leakage-aware adversarial training and more secure watermarking strategies to mitigate these leakage pathways and strengthen provenance mechanisms in AI-generated content.
Abstract
Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while maintaining visual fidelity. Our work exposes the robustness-stealthiness paradox: current "robust" watermarks sacrifice security for distortion resistance, providing insights for future watermark design.
