Table of Contents
Fetching ...

Hidden in the Noise: Two-Stage Robust Watermarking for Images

Kasra Arabi, Benjamin Feuer, R. Teal Witter, Chinmay Hegde, Niv Cohen

TL;DR

This work tackles watermark robustness for AI-generated images by removing distribution distortion from watermark signals: it treats the diffusion model's initial noise as a distortion-free watermark and enhances practicality with WIND, a two-stage detection framework that embeds Fourier-based group identifiers to narrow the search. Wind combines a large, cryptographically secured set of initial noises with efficient group-aware detection, achieving state-of-the-art resilience to forgery, removal, and regeneration attacks while preserving image quality. It also extends watermarking to non-synthetic images via inpainting-based methods, maintaining robustness against regeneration and facilitating verification across diverse content sources. The approach strengthens accountability and IP protection for diffusion-model content, offering scalable, secure, and broadly applicable watermarking for modern image synthesis systems.

Abstract

As the quality of image generators continues to improve, deepfakes become a topic of considerable societal debate. Image watermarking allows responsible model owners to detect and label their AI-generated content, which can mitigate the harm. Yet, current state-of-the-art methods in image watermarking remain vulnerable to forgery and removal attacks. This vulnerability occurs in part because watermarks distort the distribution of generated images, unintentionally revealing information about the watermarking techniques. In this work, we first demonstrate a distortion-free watermarking method for images, based on a diffusion model's initial noise. However, detecting the watermark requires comparing the initial noise reconstructed for an image to all previously used initial noises. To mitigate these issues, we propose a two-stage watermarking framework for efficient detection. During generation, we augment the initial noise with generated Fourier patterns to embed information about the group of initial noises we used. For detection, we (i) retrieve the relevant group of noises, and (ii) search within the given group for an initial noise that might match our image. This watermarking approach achieves state-of-the-art robustness to forgery and removal against a large battery of attacks.

Hidden in the Noise: Two-Stage Robust Watermarking for Images

TL;DR

This work tackles watermark robustness for AI-generated images by removing distribution distortion from watermark signals: it treats the diffusion model's initial noise as a distortion-free watermark and enhances practicality with WIND, a two-stage detection framework that embeds Fourier-based group identifiers to narrow the search. Wind combines a large, cryptographically secured set of initial noises with efficient group-aware detection, achieving state-of-the-art resilience to forgery, removal, and regeneration attacks while preserving image quality. It also extends watermarking to non-synthetic images via inpainting-based methods, maintaining robustness against regeneration and facilitating verification across diverse content sources. The approach strengthens accountability and IP protection for diffusion-model content, offering scalable, secure, and broadly applicable watermarking for modern image synthesis systems.

Abstract

As the quality of image generators continues to improve, deepfakes become a topic of considerable societal debate. Image watermarking allows responsible model owners to detect and label their AI-generated content, which can mitigate the harm. Yet, current state-of-the-art methods in image watermarking remain vulnerable to forgery and removal attacks. This vulnerability occurs in part because watermarks distort the distribution of generated images, unintentionally revealing information about the watermarking techniques. In this work, we first demonstrate a distortion-free watermarking method for images, based on a diffusion model's initial noise. However, detecting the watermark requires comparing the initial noise reconstructed for an image to all previously used initial noises. To mitigate these issues, we propose a two-stage watermarking framework for efficient detection. During generation, we augment the initial noise with generated Fourier patterns to embed information about the group of initial noises we used. For detection, we (i) retrieve the relevant group of noises, and (ii) search within the given group for an initial noise that might match our image. This watermarking approach achieves state-of-the-art robustness to forgery and removal against a large battery of attacks.

Paper Structure

This paper contains 41 sections, 2 theorems, 16 figures, 15 tables, 2 algorithms.

Key Result

Theorem 4.1

[Cryptographic Security] Let hash$: {0,1}^* \rightarrow {0,1}^\ell$ be an unbroken cryptographic hash function used in our watermarking algorithm, with inputs $i^* \in [N]$ and a secret salt $s$. Assume $s$ is sufficiently long and randomly generated. Then, even if an adversary obtains: the group nu

Figures (16)

  • Figure 1: Related watermarking methods. Tree-Ring embeds an identifiable pattern into the initial noise wen2023tree. Gaussian Shading uses a user-specific key to seed the initial noise yang2024gaussian. Our method, WIND, samples a random key from $N$ options to seed the initial noise. In order to speed up detection time, the key's group is then embedded into the initial noise.
  • Figure 2: Illustration of the WIND Method for Robust Image Watermarking. The method is designed to use $N$ possible initial noises partitioned into $M$ groups. Generation: Using a secret salt and an index $i^*$, we securely and reproducibly generate initial noise $\mathbf{z}_{i^*}$. We then embed a group index $g^*$ of that noise to make easier retrieval possible using a Fourier pattern. Finally, we run diffusion with the embedded latent noise to produce a watermarked image. Detection: We reconstruct the initial noise $\tilde{\mathbf{z}}$. Next, we search over the possible group indices $g$ for the closest Fourier pattern to the one embedded in $\tilde{\mathbf{z}}$. We then look over initial noises in group $\tilde{g}$ to find the match.
  • Figure 3: Cosine similarity distribution between initial noise, and: (i) a noise reconstructed from a watermarked image generated with the same noise (reconstructed noise) (ii) a noise reconstructed from a forged image using a public model to imitate our watermarked image (reconstruction attack, described in \ref{['sec:distortion-free']}). (iii) Random noise. These results are reliant on the approximate inversion of DDIM without the ground-truth prompt.
  • Figure 4: Detection accuracy for forgery and removal attacks using yang2024steganalysisdigitalwatermarkingdefense. A value of $0\%$ represents a watermark failure (the attacker successfully removed the watermark or forged it onto another image), while $100\%$ indicates a perfect defense (no watermark removal or forgery occurred).
  • Figure 5: Qualitative results of watermarked images generated using WIND, Tree-Ring, and RingID. See \ref{['app:additional_results']} for quantitative results. See \ref{['appendix:generated_images']} for additional qualitative results.
  • ...and 11 more figures

Theorems & Definitions (3)

  • Theorem 4.1
  • Theorem E.1
  • proof : Proof of \ref{['thm:crypto_main']}