Table of Contents
Fetching ...

Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks

Wenkai Fu, Finn Carter, Yue Wang, Emily Davis, Bo Zhang

TL;DR

This work reveals a fundamental vulnerability of robust invisible watermarks to diffusion-based image editing, showing that content-preserving regeneration can erase embedded signals with negligible perceptual change. It provides a theoretical analysis, proving that under ideal diffusion regeneration the mutual information between the watermark and the regenerated image vanishes, making decoding effectively random. It introduces unguided and decoder-guided diffusion attacks, and demonstrates through extensive experiments across multiple watermarking schemes that diffusion edits can reduce decoding accuracy to near zero while preserving image quality. The findings urge the development of generative-model-resilient watermarking and highlight broader implications for copyright protection and AI-assisted content manipulation. The work also discusses potential defenses, including semantic- or multi-scale watermarking and watermark-aware diffusion techniques, to sustain provenance in an era of powerful generative AI.

Abstract

Robust invisible watermarking aims to embed hidden messages into images such that they survive various manipulations while remaining imperceptible. However, powerful diffusion-based image generation and editing models now enable realistic content-preserving transformations that can inadvertently remove or distort embedded watermarks. In this paper, we present a theoretical and empirical analysis demonstrating that diffusion-based image editing can effectively break state-of-the-art robust watermarks designed to withstand conventional distortions. We analyze how the iterative noising and denoising process of diffusion models degrades embedded watermark signals, and provide formal proofs that under certain conditions a diffusion model's regenerated image retains virtually no detectable watermark information. Building on this insight, we propose a diffusion-driven attack that uses generative image regeneration to erase watermarks from a given image. Furthermore, we introduce an enhanced \emph{guided diffusion} attack that explicitly targets the watermark during generation by integrating the watermark decoder into the sampling loop. We evaluate our approaches on multiple recent deep learning watermarking schemes (e.g., StegaStamp, TrustMark, and VINE) and demonstrate that diffusion-based editing can reduce watermark decoding accuracy to near-zero levels while preserving high visual fidelity of the images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new watermarking strategies in the era of generative AI.

Diffusion-Based Image Editing: An Unforeseen Adversary to Robust Invisible Watermarks

TL;DR

This work reveals a fundamental vulnerability of robust invisible watermarks to diffusion-based image editing, showing that content-preserving regeneration can erase embedded signals with negligible perceptual change. It provides a theoretical analysis, proving that under ideal diffusion regeneration the mutual information between the watermark and the regenerated image vanishes, making decoding effectively random. It introduces unguided and decoder-guided diffusion attacks, and demonstrates through extensive experiments across multiple watermarking schemes that diffusion edits can reduce decoding accuracy to near zero while preserving image quality. The findings urge the development of generative-model-resilient watermarking and highlight broader implications for copyright protection and AI-assisted content manipulation. The work also discusses potential defenses, including semantic- or multi-scale watermarking and watermark-aware diffusion techniques, to sustain provenance in an era of powerful generative AI.

Abstract

Robust invisible watermarking aims to embed hidden messages into images such that they survive various manipulations while remaining imperceptible. However, powerful diffusion-based image generation and editing models now enable realistic content-preserving transformations that can inadvertently remove or distort embedded watermarks. In this paper, we present a theoretical and empirical analysis demonstrating that diffusion-based image editing can effectively break state-of-the-art robust watermarks designed to withstand conventional distortions. We analyze how the iterative noising and denoising process of diffusion models degrades embedded watermark signals, and provide formal proofs that under certain conditions a diffusion model's regenerated image retains virtually no detectable watermark information. Building on this insight, we propose a diffusion-driven attack that uses generative image regeneration to erase watermarks from a given image. Furthermore, we introduce an enhanced \emph{guided diffusion} attack that explicitly targets the watermark during generation by integrating the watermark decoder into the sampling loop. We evaluate our approaches on multiple recent deep learning watermarking schemes (e.g., StegaStamp, TrustMark, and VINE) and demonstrate that diffusion-based editing can reduce watermark decoding accuracy to near-zero levels while preserving high visual fidelity of the images. Our findings reveal a fundamental vulnerability in current robust watermarking techniques against generative model-based edits, underscoring the need for new watermarking strategies in the era of generative AI.

Paper Structure

This paper contains 27 sections, 1 equation, 2 tables, 2 algorithms.