Adversarial Attacks and Defenses on Text-to-Image Diffusion Models: A Survey
Chenyu Zhang, Mingwang Hu, Wenhui Li, Lanjun Wang
TL;DR
This survey addresses the vulnerabilities of text-to-image diffusion models, notably Stable Diffusion, to adversarial prompts affecting robustness and safety. It develops a three-dimensional taxonomy for attacks (target vs untargeted), perturbation level (character, word, sentence), and attacker knowledge (white-box vs black-box), then systematically reviews untargeted and targeted attacks and corresponding defense strategies. Key findings include that targeted attacks are more prevalent than untargeted ones, many perturbations remain perceptible, and safeguards often struggle against adversarial prompts, especially those generated via language models. The work underscores the need for holistic defenses that address both malicious prompts and adversarial prompts, and it outlines promising future directions such as LLM-driven multi-agent attack automation and pattern-based defense approaches with practical implications for deploying safe image synthesis systems.
Abstract
Recently, the text-to-image diffusion model has gained considerable attention from the community due to its exceptional image generation capability. A representative model, Stable Diffusion, amassed more than 10 million users within just two months of its release. This surge in popularity has facilitated studies on the robustness and safety of the model, leading to the proposal of various adversarial attack methods. Simultaneously, there has been a marked increase in research focused on defense methods to improve the robustness and safety of these models. In this survey, we provide a comprehensive review of the literature on adversarial attacks and defenses targeting text-to-image diffusion models. We begin with an overview of text-to-image diffusion models, followed by an introduction to a taxonomy of adversarial attacks and an in-depth review of existing attack methods. We then present a detailed analysis of current defense methods that improve model robustness and safety. Finally, we discuss ongoing challenges and explore promising future research directions. For a complete list of the adversarial attack and defense methods covered in this survey, please refer to our curated repository at https://github.com/datar001/Awesome-AD-on-T2IDM.
