On the Information-Theoretic Fragility of Robust Watermarking under Diffusion Editing
Yunyi Ni, Ziyu Yang, Ze Niu, Emily Davis, Finn Carter
TL;DR
This paper shows that diffusion-based image editing poses a fundamental information-theoretic threat to robust invisible watermarking. By modeling forward diffusion and introducing a guided diffusion attack that leverages the watermark decoder’s feedback, the authors prove that the mutual information between the embedded payload and the watermarked image vanishes as the diffusion process progresses ($I(M;I')\to 0$), and demonstrate near-zero watermark recovery in experiments on StegaStamp, TrustMark, and VINE. The work compares unguided regeneration with a strong, decoder-guided attack, revealing substantial degradation of watermark integrity while maintaining perceptual image quality. The study provides theoretical and empirical insights into why diffusion models erode watermark signals and offers design guidelines for more resilient watermarking in the generative-AI era, alongside ethical considerations and future research directions.
Abstract
Robust invisible watermarking embeds hidden information in images such that the watermark can survive various manipulations. However, the emergence of powerful diffusion-based image generation and editing techniques poses a new threat to these watermarking schemes. In this paper, we investigate the intersection of diffusion-based image editing and robust image watermarking. We analyze how diffusion-driven image edits can significantly degrade or even fully remove embedded watermarks from state-of-the-art robust watermarking systems. Both theoretical formulations and empirical experiments are provided. We prove that as a image undergoes iterative diffusion transformations, the mutual information between the watermarked image and the embedded payload approaches zero, causing watermark decoding to fail. We further propose a guided diffusion attack algorithm that explicitly targets and erases watermark signals during generation. We evaluate our approach on recent deep learning-based watermarking schemes and demonstrate near-zero watermark recovery rates after attack, while maintaining high visual fidelity of the regenerated images. Finally, we discuss ethical implications of such watermark removal capablities and provide design guidelines for future watermarking strategies to be more resilient in the era of generative AI.
