Table of Contents
Fetching ...

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka

TL;DR

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models proposes a framework that embeds visible watermarks into adversarial outputs to deter diffusion-model imitation of copyrighted content. A conditional GAN generates perturbations conditioned on a watermark, optimized with three losses—adversarial, GAN, and weighted perturbation—to produce perceptually small yet DM-attacking perturbations. The method enables rapid inference after short training on few samples and demonstrates robustness and transferability across text-guided image-to-image tasks, textual inversion, and several DM variants, outperforming prior approaches that rely on chaotic textures or model re-training. The work offers a practical, scalable approach to copyright protection in DM-based content creation with broad applicability to DreamBooth, LoRA, and Custom Diffusion while maintaining image quality.

Abstract

Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However, there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue, we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with visible watermarks and prevent DMs from imitating unauthorized images. We construct a generator based on conditional adversarial networks and design three losses (adversarial loss, GAN loss, and perturbation loss) to generate adversarial examples that have subtle perturbation but can effectively attack DMs to prevent copyright violations. Training a generator for a personal watermark by our method only requires 5-10 samples within 2-3 minutes, and once the generator is trained, it can generate adversarial examples with that watermark significantly fast (0.2s per image). We conduct extensive experiments in various conditional image-generation scenarios. Compared to existing methods that generate images with chaotic textures, our method adds visible watermarks on the generated images, which is a more straightforward way to indicate copyright violations. We also observe that our adversarial examples exhibit good transferability across unknown generative models. Therefore, this work provides a simple yet powerful way to protect copyright from DM-based imitation.

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models

TL;DR

Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models proposes a framework that embeds visible watermarks into adversarial outputs to deter diffusion-model imitation of copyrighted content. A conditional GAN generates perturbations conditioned on a watermark, optimized with three losses—adversarial, GAN, and weighted perturbation—to produce perceptually small yet DM-attacking perturbations. The method enables rapid inference after short training on few samples and demonstrates robustness and transferability across text-guided image-to-image tasks, textual inversion, and several DM variants, outperforming prior approaches that rely on chaotic textures or model re-training. The work offers a practical, scalable approach to copyright protection in DM-based content creation with broad applicability to DreamBooth, LoRA, and Custom Diffusion while maintaining image quality.

Abstract

Diffusion Models (DMs) have shown remarkable capabilities in various image-generation tasks. However, there are growing concerns that DMs could be used to imitate unauthorized creations and thus raise copyright issues. To address this issue, we propose a novel framework that embeds personal watermarks in the generation of adversarial examples. Such examples can force DMs to generate images with visible watermarks and prevent DMs from imitating unauthorized images. We construct a generator based on conditional adversarial networks and design three losses (adversarial loss, GAN loss, and perturbation loss) to generate adversarial examples that have subtle perturbation but can effectively attack DMs to prevent copyright violations. Training a generator for a personal watermark by our method only requires 5-10 samples within 2-3 minutes, and once the generator is trained, it can generate adversarial examples with that watermark significantly fast (0.2s per image). We conduct extensive experiments in various conditional image-generation scenarios. Compared to existing methods that generate images with chaotic textures, our method adds visible watermarks on the generated images, which is a more straightforward way to indicate copyright violations. We also observe that our adversarial examples exhibit good transferability across unknown generative models. Therefore, this work provides a simple yet powerful way to protect copyright from DM-based imitation.
Paper Structure (31 sections, 6 equations, 13 figures, 7 tables)

This paper contains 31 sections, 6 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Copyright issues of DMs and adversarial example-based methods for copyright protection. Without protection, DMs can easily imitate original images under different image generation scenarios. A previous method (AdvDM liang2023adversarial) generates adversarial examples by optimization against DMs to prevent DMs from extracting the feature of the original images, resulting in the generation of chaotic images. Our method goes one step further to build a generator that embeds personal watermarks into the generation of adversarial images. Such examples force DMs to generate images with visible watermarks for tracing copyright. Our method is fast, simple yet powerful in protecting copyrights against DMs.
  • Figure 2: Architecture overview. $G$ generates perturbation for $x$ conditioned on $m$. $D$ and $G$ produce $\mathcal{L}_{GAN}$ to compel $x^\prime$ to be closer to $x$. $\mathcal{L}_{adv}$ aims to force the image generated by $\theta$ to display a visible watermark $m$, and $\mathcal{L}_{pert}$ further bounds the magnitude of the perturbation.
  • Figure 3: Comparison of different methods under text-guided image-to-image generation. Source is the image input to the LDM, and generated image is the output of the LDM (strength 0.3). The watermarks of our method are designed according to the artist name (from left to right: REMBRANDT, AUGUSTE_REN).
  • Figure 4: Examples of textual inversion on ImageNet. The watermarks used in the examples are IMAGENET_CAT and IMAGENET_DOG.
  • Figure 5: Examples of the generated image from original image, attack image, and attack image with defense.
  • ...and 8 more figures