Table of Contents
Fetching ...

Invisible Image Watermarks Are Provably Removable Using Generative AI

Xuandong Zhao, Kexun Zhang, Zihao Su, Saastha Vasan, Ilya Grishchenko, Christopher Kruegel, Giovanni Vigna, Yu-Xiang Wang, Lei Li

TL;DR

It is demonstrated that pixel-level invisible watermarks are vulnerable to this regeneration attack, and underscores the need for a shift in research/industry emphasis from invisible watermarks to semantic-preserving watermarks.

Abstract

Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners. They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image. This approach is flexible and can be instantiated with many existing image-denoising algorithms and pre-trained generative models such as diffusion models. Through formal proofs and extensive empirical evaluations, we demonstrate that pixel-level invisible watermarks are vulnerable to this regeneration attack. Our results reveal that, across four different pixel-level watermarking schemes, the proposed method consistently achieves superior performance compared to existing attack techniques, with lower detection rates and higher image quality. However, watermarks that keep the image semantically similar can be an alternative defense against our attacks. Our finding underscores the need for a shift in research/industry emphasis from invisible watermarks to semantic-preserving watermarks. Code is available at https://github.com/XuandongZhao/WatermarkAttacker

Invisible Image Watermarks Are Provably Removable Using Generative AI

TL;DR

It is demonstrated that pixel-level invisible watermarks are vulnerable to this regeneration attack, and underscores the need for a shift in research/industry emphasis from invisible watermarks to semantic-preserving watermarks.

Abstract

Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners. They also prevent people from misusing images, especially those generated by AI models. We propose a family of regeneration attacks to remove these invisible watermarks. The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image. This approach is flexible and can be instantiated with many existing image-denoising algorithms and pre-trained generative models such as diffusion models. Through formal proofs and extensive empirical evaluations, we demonstrate that pixel-level invisible watermarks are vulnerable to this regeneration attack. Our results reveal that, across four different pixel-level watermarking schemes, the proposed method consistently achieves superior performance compared to existing attack techniques, with lower detection rates and higher image quality. However, watermarks that keep the image semantically similar can be an alternative defense against our attacks. Our finding underscores the need for a shift in research/industry emphasis from invisible watermarks to semantic-preserving watermarks. Code is available at https://github.com/XuandongZhao/WatermarkAttacker
Paper Structure (34 sections, 6 theorems, 20 equations, 15 figures, 2 tables, 1 algorithm)

This paper contains 34 sections, 6 theorems, 20 equations, 15 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.3

For a $\Delta$-invisible watermarking scheme with respect to $\ell_2$-distance. Assume the embedding function $\phi$ of the diffusion model is $L_{x,w}$-Locally Lipschitz. The randomized algorithm $\mathcal{A}(\phi(\cdot) + \mathcal{N}(0,\sigma^2 I_d))$ produces a reconstructed image $\hat{x}$ which where $\Phi$ is the Cumulative Density Function function of the standard normal distribution.

Figures (15)

  • Figure 1: Removing invisible watermarks: The proposed attack first maps the watermarked image to its embedding, which is another representation of the image. Then the embedding is noised to destruct the watermark. After that, a regeneration algorithm reconstructs the image from the noisy embedding.
  • Figure 1: Performance of different watermarking methods. All methods successfully detect the embedded watermark.
  • Figure 2: Theoretical and empirical trade-off functions for DwtDctSvd watermark detectors after our attack. Trade-off functions indicate how much less Type II error (false negative rate) the detector gets in return by having more Type I error (false positive rate). Theoretically, after the attack, no detection algorithm can fall in the Impossibility Region and have both Type I error and Type II error at a low level. Empirically, the watermark detector performs even worse than the theory, indicating the success of our attack and the validity of the theoretical bound. We use 500 watermarked MS-COCO images with an empirically valid upper bound of $L=1$ and noise level $\sigma = 1.16\Delta$. An additional example for the RivaGAN watermark is provided in Figure \ref{['fig:nips_error']}.
  • Figure 3: Examples of attacks on DwtDctSvd watermarking, including destructive attacks (e.g., brightness change and JPEG compression), constructive attacks (e.g., Gaussian blur), and regeneration attacks using VAEs and diffusion models. Brightness change, JPEG compression, VAE attack, and diffusion attack successfully remove the watermark. The VAE attack over-smooths the image, resulting in blurriness. The diffusion attack maintains high image quality while removing the watermark. Additional attack examples for other watermarking schemes are in Figures \ref{['fig:example_appendix']}, \ref{['fig:nips_rivagan']}, \ref{['fig:nips_ssl']}, \ref{['fig:nips_stega']}.
  • Figure 4: Quality-detectability tradeoff for four watermarking schemes under eight attack methods on the MS-COCO dataset. Regeneration attacks (Diffusion model, VAE-Cheng2020, and VAE-Bmshj2018) are highlighted for their performance. The x-axis shows image quality metrics (SSIM and PSNR, higher values indicate better quality), while the y-axis represents the detection metric True Positive Rate at 1% False Positive Rate (TPR@FPR=0.01, lower values are better for attackers). The strongest attacker should appear in the lower right corner of these plots. Regeneration attacks demonstrate superior performance compared to other attack methods, achieving both lower TPR and higher image quality. Quality-detectability tradeoff results for the SDP dataset are in Figure \ref{['fig:all_res_2']}.
  • ...and 10 more figures

Theorems & Definitions (19)

  • Definition 2.1: Invisible watermark
  • Definition 2.2: Watermark detection
  • Definition 4.1: $f$-Certified-Watermark-Free
  • Definition 4.2: Local Watermark-Specific Lipschitz property
  • Theorem 4.3
  • Theorem 4.4
  • Corollary 4.5
  • Definition C.1: $f$-Certified-Watermark-Free
  • Definition C.2: Local Watermark-Specific Lipschitz property
  • Theorem C.3
  • ...and 9 more