Table of Contents
Fetching ...

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

Pierre Fernandez, Guillaume Couairon, Hervé Jégou, Matthijs Douze, Teddy Furon

TL;DR

This work addresses the challenge of responsibly deploying image-generation systems by introducing Stable Signature, an active watermarking approach embedded during generation in Latent Diffusion Models. By fine-tuning only the latent decoder and using a pre-trained watermark extractor, all generated images carry an invisible, retrievable binary signature, enabling both detection of AI-generated content and identification of its source. The authors formulate a rigorous statistical framework for detection and identification, train an extractor with whitening to ensure i.i.d.-like bit behavior, and demonstrate robust performance across a range of tasks and transformations with minimal impact on perceptual quality. They also analyze adversarial and collusive attacks, showing that while some attacks can degrade watermark integrity, the embedded watermark remains broadly resilient under practical threat models. Overall, Stable Signature offers a scalable, secure mechanism for tracing and policing generated content, with reproducibility and environmental considerations discussed.

Abstract

Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep $10\%$ of the content, with $90$+$\%$ accuracy at a false positive rate below 10$^{-6}$.

The Stable Signature: Rooting Watermarks in Latent Diffusion Models

TL;DR

This work addresses the challenge of responsibly deploying image-generation systems by introducing Stable Signature, an active watermarking approach embedded during generation in Latent Diffusion Models. By fine-tuning only the latent decoder and using a pre-trained watermark extractor, all generated images carry an invisible, retrievable binary signature, enabling both detection of AI-generated content and identification of its source. The authors formulate a rigorous statistical framework for detection and identification, train an extractor with whitening to ensure i.i.d.-like bit behavior, and demonstrate robust performance across a range of tasks and transformations with minimal impact on perceptual quality. They also analyze adversarial and collusive attacks, showing that while some attacks can degrade watermark integrity, the embedded watermark remains broadly resilient under practical threat models. Overall, Stable Signature offers a scalable, secure mechanism for tracing and policing generated content, with reproducibility and environmental considerations discussed.

Abstract

Generative image modeling enables a wide range of applications but raises ethical concerns about responsible deployment. This paper introduces an active strategy combining image watermarking and Latent Diffusion Models. The goal is for all generated images to conceal an invisible watermark allowing for future detection and/or identification. The method quickly fine-tunes the latent decoder of the image generator, conditioned on a binary signature. A pre-trained watermark extractor recovers the hidden signature from any generated image and a statistical test then determines whether it comes from the generative model. We evaluate the invisibility and robustness of the watermarks on a variety of generation tasks, showing that Stable Signature works even after the images are modified. For instance, it detects the origin of an image generated from a text prompt, then cropped to keep of the content, with + accuracy at a false positive rate below 10.
Paper Structure (48 sections, 5 equations, 17 figures, 6 tables)

This paper contains 48 sections, 5 equations, 17 figures, 6 tables.

Figures (17)

  • Figure 1: Overview. The latent decoder can be fine-tuned to preemptively embed a signature into all generated images.
  • Figure 2: Steps of the method. (a) We pre-train a watermark encoder $\mathcal{W}_E$ and extractor $\mathcal{W}$, to extract binary messages. (b) We fine-tune the decoder $\mathcal{D}$ of the LDM's auto-encoder with a fixed signature $m$ such that all the generated images (c) lead to $m$ through $\mathcal{W}$.
  • Figure 3: Detection results. TPR/FPR curve of the detection under different transformations. Forensics$^\dagger$ indicates passive detection (without watermark) corvi2022detection.
  • Figure 4: Identification results. Proportion of well-identified users. Detection with FPR=$10^{-6}$ is run beforehand, and we consider it an error if the image is not flagged.
  • Figure 5: Transformations evaluated in Sec. \ref{['sec:application']} & \ref{['sec:experiments']}. 'Combined' is made of crop $50\%$, brightness adjustment $1.5$ and JPEG $80$ compression.
  • ...and 12 more figures