Targeted Attack Improves Protection against Unauthorized Diffusion Customization
Boyang Zheng, Chumeng Liang, Xiaoyu Wu
TL;DR
This work tackles unauthorized diffusion customization by introducing ACE, a targeted adversarial attack that steers the diffusion model's score function $s_{\theta}(z'(t),t)$ toward a fixed chaotic target $\mathcal{T}$ to degrade customization outputs. ACE (and its variant ACE$^+$/ACE$^*$) is optimized via PGD under a small perturbation budget $\zeta$ and is designed to outperform untargeted protections by producing consistent chaotic patterns in generated images. Extensive experiments on LoRA+DreamBooth and SDEdit across CelebA-HQ and WikiArt show that ACE achieves superior degradation of customization quality, with ACE reaching an extreme FDFR of 1.00 and demonstrating transferability across diffusion backbones and resilience to purification methods. The authors also provide a mechanistic hypothesis: targeted attacks induce a learning bias during fine-tuning that reinforces a reverse chaotic pattern in outputs, explaining why targeted approaches outperform untargeted ones and offering insight for future defense design.
Abstract
Diffusion models build a new milestone for image generation yet raising public concerns, for they can be fine-tuned on unauthorized images for customization. Protection based on adversarial attacks rises to encounter this unauthorized diffusion customization, by adding protective watermarks to images and poisoning diffusion models. However, current protection, leveraging untargeted attacks, does not appear to be effective enough. In this paper, we propose a simple yet effective improvement for the protection against unauthorized diffusion customization by introducing targeted attacks. We show that by carefully selecting the target, targeted attacks significantly outperform untargeted attacks in poisoning diffusion models and degrading the customization image quality. Extensive experiments validate the superiority of our method on two mainstream customization methods of diffusion models, compared to existing protections. To explain the surprising success of targeted attacks, we delve into the mechanism of attack-based protections and propose a hypothesis based on our observation, which enhances the comprehension of attack-based protections. To the best of our knowledge, we are the first to both reveal the vulnerability of diffusion models to targeted attacks and leverage targeted attacks to enhance protection against unauthorized diffusion customization. Our code is available on GitHub: https://github.com/psyker-team/mist-v2.
