Table of Contents
Fetching ...

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

Boyang Zheng, Chumeng Liang, Xiaoyu Wu

TL;DR

This work tackles unauthorized diffusion customization by introducing ACE, a targeted adversarial attack that steers the diffusion model's score function $s_{\theta}(z'(t),t)$ toward a fixed chaotic target $\mathcal{T}$ to degrade customization outputs. ACE (and its variant ACE$^+$/ACE$^*$) is optimized via PGD under a small perturbation budget $\zeta$ and is designed to outperform untargeted protections by producing consistent chaotic patterns in generated images. Extensive experiments on LoRA+DreamBooth and SDEdit across CelebA-HQ and WikiArt show that ACE achieves superior degradation of customization quality, with ACE reaching an extreme FDFR of 1.00 and demonstrating transferability across diffusion backbones and resilience to purification methods. The authors also provide a mechanistic hypothesis: targeted attacks induce a learning bias during fine-tuning that reinforces a reverse chaotic pattern in outputs, explaining why targeted approaches outperform untargeted ones and offering insight for future defense design.

Abstract

Diffusion models build a new milestone for image generation yet raising public concerns, for they can be fine-tuned on unauthorized images for customization. Protection based on adversarial attacks rises to encounter this unauthorized diffusion customization, by adding protective watermarks to images and poisoning diffusion models. However, current protection, leveraging untargeted attacks, does not appear to be effective enough. In this paper, we propose a simple yet effective improvement for the protection against unauthorized diffusion customization by introducing targeted attacks. We show that by carefully selecting the target, targeted attacks significantly outperform untargeted attacks in poisoning diffusion models and degrading the customization image quality. Extensive experiments validate the superiority of our method on two mainstream customization methods of diffusion models, compared to existing protections. To explain the surprising success of targeted attacks, we delve into the mechanism of attack-based protections and propose a hypothesis based on our observation, which enhances the comprehension of attack-based protections. To the best of our knowledge, we are the first to both reveal the vulnerability of diffusion models to targeted attacks and leverage targeted attacks to enhance protection against unauthorized diffusion customization. Our code is available on GitHub: https://github.com/psyker-team/mist-v2.

Targeted Attack Improves Protection against Unauthorized Diffusion Customization

TL;DR

This work tackles unauthorized diffusion customization by introducing ACE, a targeted adversarial attack that steers the diffusion model's score function toward a fixed chaotic target to degrade customization outputs. ACE (and its variant ACE/ACE) is optimized via PGD under a small perturbation budget and is designed to outperform untargeted protections by producing consistent chaotic patterns in generated images. Extensive experiments on LoRA+DreamBooth and SDEdit across CelebA-HQ and WikiArt show that ACE achieves superior degradation of customization quality, with ACE reaching an extreme FDFR of 1.00 and demonstrating transferability across diffusion backbones and resilience to purification methods. The authors also provide a mechanistic hypothesis: targeted attacks induce a learning bias during fine-tuning that reinforces a reverse chaotic pattern in outputs, explaining why targeted approaches outperform untargeted ones and offering insight for future defense design.

Abstract

Diffusion models build a new milestone for image generation yet raising public concerns, for they can be fine-tuned on unauthorized images for customization. Protection based on adversarial attacks rises to encounter this unauthorized diffusion customization, by adding protective watermarks to images and poisoning diffusion models. However, current protection, leveraging untargeted attacks, does not appear to be effective enough. In this paper, we propose a simple yet effective improvement for the protection against unauthorized diffusion customization by introducing targeted attacks. We show that by carefully selecting the target, targeted attacks significantly outperform untargeted attacks in poisoning diffusion models and degrading the customization image quality. Extensive experiments validate the superiority of our method on two mainstream customization methods of diffusion models, compared to existing protections. To explain the surprising success of targeted attacks, we delve into the mechanism of attack-based protections and propose a hypothesis based on our observation, which enhances the comprehension of attack-based protections. To the best of our knowledge, we are the first to both reveal the vulnerability of diffusion models to targeted attacks and leverage targeted attacks to enhance protection against unauthorized diffusion customization. Our code is available on GitHub: https://github.com/psyker-team/mist-v2.
Paper Structure (37 sections, 12 equations, 18 figures, 7 tables)

This paper contains 37 sections, 12 equations, 18 figures, 7 tables.

Figures (18)

  • Figure 1: Output images of two mainstream diffusion customization, SDEdit (top two rows) and LoRA (bottom two rows) under different protections with perturbation budget $4/255$. ACE and ACE+ are our targeted attack, while others are baselines based on untargeted attacks.
  • Figure 2: Target $\boldsymbol{\mathcal{T}}$ (left) and its corresponding image (right)
  • Figure 3: Comparison between $\epsilon_{adv}$ and $\mathcal{B}_{spl}$ of ASPL and ACE. Blue-framed images are protected images that we use to compute $\epsilon_{adv}$. Red-framed images are clean images that we use to compute $\mathcal{B}_{spl}$. We visualize $\epsilon_{adv},\mathcal{B}_{spl}\in\mathbb{R}^{64\times 64\times 4}$ as images with 4 channels. Complementary colors mean two pixel are reverse to each other. There is visible pattern correlation between $\epsilon_{adv}$ and $\mathcal{B}_{spl}$ in ACE.
  • Figure 4: Demonstration of three steps in Hypothesis 5.1. First, Attacking step increases $\epsilon_{adv}$ of protected images. Second, Finetuning step trains the diffusion model to $\epsilon_{adv}$ by a bias $\mathcal{B}_{spl}$, whose direction is reversal to $\epsilon_{adv}$. Third, customized diffusion models include $\mathcal{B}_{spl}$ in sampling so that their output images appear to have chaotic patterns. This hypothesis explains why $\epsilon_{adv}$ and $\mathcal{B}_{spl}$ of ACE are reverse to each other as shown in Figure \ref{['fig:sampling_bias']}.
  • Figure 5: An example for visualization. We use the same color bar in all visualizations.
  • ...and 13 more figures