Table of Contents
Fetching ...

Diffusion-Based Image Editing for Breaking Robust Watermarks

Yunyi Ni, Finn Carter, Ze Niu, Emily Davis, Bo Zhang

TL;DR

This work shows that diffusion-based image editing can effectively erase robust invisible watermarks while preserving perceptual image quality, challenging current watermarking defenses. It introduces unguided and guided diffusion attacks, including a white-box variant that uses the watermark decoder as a guidance signal, and provides a mutual-information-based theoretical framework showing watermark information vanishes under sufficient diffusion. Empirical results across multiple watermarking schemes (HiDDeN, StegaStamp, TrustMark, VINE, and a classic spread-spectrum) demonstrate near-zero recovery rates under diffusion attacks, with PSNR values indicating maintained visual fidelity. The findings highlight a critical vulnerability in pixel-level invisible watermarks in the era of generative AI and motivate exploring semantic or content-tied watermarking and defenses that account for diffusion-based erasure.

Abstract

Robust invisible watermarking aims to embed hidden information into images such that the watermark can survive various image manipulations. However, the rise of powerful diffusion-based image generation and editing techniques poses a new threat to these watermarking schemes. In this paper, we present a theoretical study and method demonstrating that diffusion models can effectively break robust image watermarks that were designed to resist conventional perturbations. We show that a diffusion-driven ``image regeneration'' process can erase embedded watermarks while preserving perceptual image content. We further introduce a novel guided diffusion attack that explicitly targets the watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion-based transformation, the mutual information between the watermarked image and the embedded watermark payload vanishes, resulting in decoding failure. Experimentally, we evaluate our approach on multiple state-of-the-art watermarking schemes (including the deep learning-based methods StegaStamp, TrustMark, and VINE) and demonstrate near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings highlight a fundamental vulnerability in current robust watermarking techniques against generative model-based attacks, underscoring the need for new watermarking strategies in the era of generative AI.

Diffusion-Based Image Editing for Breaking Robust Watermarks

TL;DR

This work shows that diffusion-based image editing can effectively erase robust invisible watermarks while preserving perceptual image quality, challenging current watermarking defenses. It introduces unguided and guided diffusion attacks, including a white-box variant that uses the watermark decoder as a guidance signal, and provides a mutual-information-based theoretical framework showing watermark information vanishes under sufficient diffusion. Empirical results across multiple watermarking schemes (HiDDeN, StegaStamp, TrustMark, VINE, and a classic spread-spectrum) demonstrate near-zero recovery rates under diffusion attacks, with PSNR values indicating maintained visual fidelity. The findings highlight a critical vulnerability in pixel-level invisible watermarks in the era of generative AI and motivate exploring semantic or content-tied watermarking and defenses that account for diffusion-based erasure.

Abstract

Robust invisible watermarking aims to embed hidden information into images such that the watermark can survive various image manipulations. However, the rise of powerful diffusion-based image generation and editing techniques poses a new threat to these watermarking schemes. In this paper, we present a theoretical study and method demonstrating that diffusion models can effectively break robust image watermarks that were designed to resist conventional perturbations. We show that a diffusion-driven ``image regeneration'' process can erase embedded watermarks while preserving perceptual image content. We further introduce a novel guided diffusion attack that explicitly targets the watermark signal during generation, significantly degrading watermark detectability. Theoretically, we prove that as an image undergoes sufficient diffusion-based transformation, the mutual information between the watermarked image and the embedded watermark payload vanishes, resulting in decoding failure. Experimentally, we evaluate our approach on multiple state-of-the-art watermarking schemes (including the deep learning-based methods StegaStamp, TrustMark, and VINE) and demonstrate near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Our findings highlight a fundamental vulnerability in current robust watermarking techniques against generative model-based attacks, underscoring the need for new watermarking strategies in the era of generative AI.

Paper Structure

This paper contains 23 sections, 2 theorems, 4 equations, 1 table.

Key Result

Proposition 1

For a watermark encoder that embeds $k$ bits via independent spread-spectrum patterns of strength $\beta$, the probability of decoding the entire message correctly after Gaussian noise of variance $\sigma^2$ is at most $\Phi(\beta/\sigma)^k$. In particular, as $\sigma/\beta$ grows, this probability

Theorems & Definitions (4)

  • Proposition 1
  • proof
  • Theorem 1
  • proof : Sketch of Proof