Table of Contents
Fetching ...

Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation

Zhongjie Ba, Yitao Zhang, Peng Cheng, Bin Gong, Xinyu Zhang, Qinglong Wang, Kui Ren

TL;DR

This paper reveals a robustness–stealthiness paradox in watermarking for AI-generated content: making watermarks more robust against distortions increases watermark information leakage. It introduces DAPAO, a no-box attack that uses multi-channel features from a pre-trained vision model to extract watermark leakage from a single watermarked image, enabling both detection evasion and semantic forgery with high success rates. Empirical evaluation across multiple datasets and watermarking methods shows substantial gains in evasion (up to ~87% in some cases) and forgery (up to ~85%) while preserving visual quality, underscoring security risks in current robust watermark designs. The work advocates defenses such as leakage-aware adversarial training and more secure watermarking strategies to mitigate these leakage pathways and strengthen provenance mechanisms in AI-generated content.

Abstract

Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while maintaining visual fidelity. Our work exposes the robustness-stealthiness paradox: current "robust" watermarks sacrifice security for distortion resistance, providing insights for future watermark design.

Robust Watermarks Leak: Channel-Aware Feature Extraction Enables Adversarial Watermark Manipulation

TL;DR

This paper reveals a robustness–stealthiness paradox in watermarking for AI-generated content: making watermarks more robust against distortions increases watermark information leakage. It introduces DAPAO, a no-box attack that uses multi-channel features from a pre-trained vision model to extract watermark leakage from a single watermarked image, enabling both detection evasion and semantic forgery with high success rates. Empirical evaluation across multiple datasets and watermarking methods shows substantial gains in evasion (up to ~87% in some cases) and forgery (up to ~85%) while preserving visual quality, underscoring security risks in current robust watermark designs. The work advocates defenses such as leakage-aware adversarial training and more secure watermarking strategies to mitigate these leakage pathways and strengthen provenance mechanisms in AI-generated content.

Abstract

Watermarking plays a key role in the provenance and detection of AI-generated content. While existing methods prioritize robustness against real-world distortions (e.g., JPEG compression and noise addition), we reveal a fundamental tradeoff: such robust watermarks inherently improve the redundancy of detectable patterns encoded into images, creating exploitable information leakage. To leverage this, we propose an attack framework that extracts leakage of watermark patterns through multi-channel feature learning using a pre-trained vision model. Unlike prior works requiring massive data or detector access, our method achieves both forgery and detection evasion with a single watermarked image. Extensive experiments demonstrate that our method achieves a 60\% success rate gain in detection evasion and 51\% improvement in forgery accuracy compared to state-of-the-art methods while maintaining visual fidelity. Our work exposes the robustness-stealthiness paradox: current "robust" watermarks sacrifice security for distortion resistance, providing insights for future watermark design.

Paper Structure

This paper contains 32 sections, 2 theorems, 19 equations, 41 figures, 5 tables, 2 algorithms.

Key Result

Proposition 4.3

When the robustness requirement exceeds $C(I)$, a decline in visual quality is inevitable.

Figures (41)

  • Figure 1: Demonstration of our attacks. An attacker can perform watermark removal and forgery attacks with only one watermarked image without knowledge about the underlying watermarking systems. The attacker is free of copyright violation accusations as the extracted watermark is incorrect; the attacker can spread fake news by forging the watermark of an authoritative media.
  • Figure 2: Illustration of learning-based watermarking methods.
  • Figure 3: Typical watermarking application and security threats. Organizations and individuals use watermarking services to embed watermarks into images for purposes such as copyright protection or content regulation. When image ownership verification is required, the watermark is extracted and matched through the watermarking service. However, attackers can apply carefully designed post-processing techniques to remove or forge the watermark.
  • Figure 4: Demonstration of our feasibility study.
  • Figure 5: An overview of our attack.
  • ...and 36 more figures

Theorems & Definitions (7)

  • Definition 4.1
  • Definition 4.2
  • Proposition 4.3
  • proof
  • Definition 3.1
  • Proposition 3.2
  • proof