Table of Contents
Fetching ...

A Watermark-Conditioned Diffusion Model for IP Protection

Rui Min, Sen Li, Hongyang Chen, Minhao Cheng

TL;DR

WaDiff introduces a watermark-conditioned diffusion model that embeds user-specific fingerprints directly into the generation process, enabling both detection of AI-generated content and identification of the generating user in a black-box API setting. The framework relies on two losses—message retrieval and consistency—and selective end-to-end fine-tuning of a small head to maintain image quality while injecting watermark information. Experimental results demonstrate strong detection and tracing performance across large user pools, with high SSIM and minimal FID impact, and robustness to common augmentations and adaptive attacks. This approach offers scalable, stealthy IP protection for diffusion-based content, facilitating forensic attribution and governance in real-world deployments.

Abstract

The ethical need to protect AI-generated content has been a significant concern in recent years. While existing watermarking strategies have demonstrated success in detecting synthetic content (detection), there has been limited exploration in identifying the users responsible for generating these outputs from a single model (owner identification). In this paper, we focus on both practical scenarios and propose a unified watermarking framework for content copyright protection within the context of diffusion models. Specifically, we consider two parties: the model provider, who grants public access to a diffusion model via an API, and the users, who can solely query the model API and generate images in a black-box manner. Our task is to embed hidden information into the generated contents, which facilitates further detection and owner identification. To tackle this challenge, we propose a Watermark-conditioned Diffusion model called WaDiff, which manipulates the watermark as a conditioned input and incorporates fingerprinting into the generation process. All the generative outputs from our WaDiff carry user-specific information, which can be recovered by an image extractor and further facilitate forensic identification. Extensive experiments are conducted on two popular diffusion models, and we demonstrate that our method is effective and robust in both the detection and owner identification tasks. Meanwhile, our watermarking framework only exerts a negligible impact on the original generation and is more stealthy and efficient in comparison to existing watermarking strategies.

A Watermark-Conditioned Diffusion Model for IP Protection

TL;DR

WaDiff introduces a watermark-conditioned diffusion model that embeds user-specific fingerprints directly into the generation process, enabling both detection of AI-generated content and identification of the generating user in a black-box API setting. The framework relies on two losses—message retrieval and consistency—and selective end-to-end fine-tuning of a small head to maintain image quality while injecting watermark information. Experimental results demonstrate strong detection and tracing performance across large user pools, with high SSIM and minimal FID impact, and robustness to common augmentations and adaptive attacks. This approach offers scalable, stealthy IP protection for diffusion-based content, facilitating forensic attribution and governance in real-world deployments.

Abstract

The ethical need to protect AI-generated content has been a significant concern in recent years. While existing watermarking strategies have demonstrated success in detecting synthetic content (detection), there has been limited exploration in identifying the users responsible for generating these outputs from a single model (owner identification). In this paper, we focus on both practical scenarios and propose a unified watermarking framework for content copyright protection within the context of diffusion models. Specifically, we consider two parties: the model provider, who grants public access to a diffusion model via an API, and the users, who can solely query the model API and generate images in a black-box manner. Our task is to embed hidden information into the generated contents, which facilitates further detection and owner identification. To tackle this challenge, we propose a Watermark-conditioned Diffusion model called WaDiff, which manipulates the watermark as a conditioned input and incorporates fingerprinting into the generation process. All the generative outputs from our WaDiff carry user-specific information, which can be recovered by an image extractor and further facilitate forensic identification. Extensive experiments are conducted on two popular diffusion models, and we demonstrate that our method is effective and robust in both the detection and owner identification tasks. Meanwhile, our watermarking framework only exerts a negligible impact on the original generation and is more stealthy and efficient in comparison to existing watermarking strategies.
Paper Structure (40 sections, 9 equations, 12 figures, 5 tables)

This paper contains 40 sections, 9 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Illustration of our proposed WaDiff. All users access the diffusion model by querying the public API and are assigned a unique watermark. The generation process is conditioned on the watermark, and each user's generated outputs would carry specific fingerprinting information which is further utilized to identify the owner of the generated image.
  • Figure 2: Watermarked examples of our method and $\text{Tree-Ring}_{Rings}$ sampled from the Stable Diffusion. It is observed that our method achieves a substantial improvement in image consistency among images with diverse watermarks.
  • Figure 3: Overview of our proposed watermarking framework. The top two figures illustrate our fine-tuning process. For $t>\tau$, we solely focus on preserving image consistency and incorporate a null watermark. For $t\leq \tau$, we integrate the normal watermark and introduce an additional message retrieval loss to embed watermarks. The inference stage is depicted below, where we inject the null watermark when $t>\tau$ and transition it to the payload watermark when $t\leq\tau$.
  • Figure 4: The tracing accuracy results of two diffusion models with different DDIM sampling steps. We denote $T-m$ as tracing among $m$ users.
  • Figure 5: The tracing accuracy, along with the SSIM and FID difference, are demonstrated against different fine-tuning sections. Note that both the SSIM and FID difference values have been rescaled using the min-max algorithm to a range of $[0, 1]$.
  • ...and 7 more figures