Table of Contents
Fetching ...

Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey

Vu Tuan Truong, Luan Ba Dang, Long Bao Le

TL;DR

This survey addresses the security of diffusion models by cataloging three main attack families—backdoor, adversarial, and membership inference—and surveying corresponding defenses. It surveys five diffusion-model families (DDPM, DDIM, NCSN, score-based SDE, and multi-modal conditional DMs) and provides a unified view of forward/noise and reverse/denoising mechanics, training objectives, and sampling. The work aggregates state-of-the-art attacks (e.g., TrojDiff, BadDiffusion, RickRolling, MFA) and defenses (safety filters, unlearning, differential privacy, and knowledge distillation), and outlines open challenges including cross-modal backdoors, prompt-based adversaries, and privacy concerns in large, publicly released models. Overall, the paper highlights the societal risks of released DMs and charts directions for robust, privacy-preserving diffusion-based systems. The findings underscore the need to extend security research beyond vision-language to audio, time-series, and 3D domains, while developing scalable, effective defenses that balance utility and safety.

Abstract

Diffusion models (DMs) have achieved state-of-the-art performance on various generative tasks such as image synthesis, text-to-image, and text-guided image-to-image generation. However, the more powerful the DMs, the more harmful they potentially are. Recent studies have shown that DMs are prone to a wide range of attacks, including adversarial attacks, membership inference, backdoor injection, and various multi-modal threats. Since numerous pre-trained DMs are published widely on the Internet, potential threats from these attacks are especially detrimental to the society, making DM-related security a worth investigating topic. Therefore, in this paper, we conduct a comprehensive survey on the security aspect of DMs, focusing on various attack and defense methods for DMs. First, we present crucial knowledge of DMs with five main types of DMs, including denoising diffusion probabilistic models, denoising diffusion implicit models, noise conditioned score networks, stochastic differential equations, and multi-modal conditional DMs. We further survey a variety of recent studies investigating different types of attacks that exploit the vulnerabilities of DMs. Then, we thoroughly review potential countermeasures to mitigate each of the presented threats. Finally, we discuss open challenges of DM-related security and envision certain research directions for this topic.

Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey

TL;DR

This survey addresses the security of diffusion models by cataloging three main attack families—backdoor, adversarial, and membership inference—and surveying corresponding defenses. It surveys five diffusion-model families (DDPM, DDIM, NCSN, score-based SDE, and multi-modal conditional DMs) and provides a unified view of forward/noise and reverse/denoising mechanics, training objectives, and sampling. The work aggregates state-of-the-art attacks (e.g., TrojDiff, BadDiffusion, RickRolling, MFA) and defenses (safety filters, unlearning, differential privacy, and knowledge distillation), and outlines open challenges including cross-modal backdoors, prompt-based adversaries, and privacy concerns in large, publicly released models. Overall, the paper highlights the societal risks of released DMs and charts directions for robust, privacy-preserving diffusion-based systems. The findings underscore the need to extend security research beyond vision-language to audio, time-series, and 3D domains, while developing scalable, effective defenses that balance utility and safety.

Abstract

Diffusion models (DMs) have achieved state-of-the-art performance on various generative tasks such as image synthesis, text-to-image, and text-guided image-to-image generation. However, the more powerful the DMs, the more harmful they potentially are. Recent studies have shown that DMs are prone to a wide range of attacks, including adversarial attacks, membership inference, backdoor injection, and various multi-modal threats. Since numerous pre-trained DMs are published widely on the Internet, potential threats from these attacks are especially detrimental to the society, making DM-related security a worth investigating topic. Therefore, in this paper, we conduct a comprehensive survey on the security aspect of DMs, focusing on various attack and defense methods for DMs. First, we present crucial knowledge of DMs with five main types of DMs, including denoising diffusion probabilistic models, denoising diffusion implicit models, noise conditioned score networks, stochastic differential equations, and multi-modal conditional DMs. We further survey a variety of recent studies investigating different types of attacks that exploit the vulnerabilities of DMs. Then, we thoroughly review potential countermeasures to mitigate each of the presented threats. Finally, we discuss open challenges of DM-related security and envision certain research directions for this topic.
Paper Structure (52 sections, 63 equations, 6 figures, 7 tables)

This paper contains 52 sections, 63 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Diffusion models viewed from different perspectives according to four main categories, including DDPMs, DDIMs, NCSNs, and SDE.
  • Figure 2: A summary of our survey on attack and defense methods for DMs.
  • Figure 3: A comparison in terms of backdoor attacks between BadDiffusionChou2023CVPR and TrojDiffchen2023trojdiff.
  • Figure 4: Personalization via DreamBoothruiz2023dreambooth.
  • Figure 5: An overview of different types of adversarial attack on DMs, categorized by the perturbation target. Example images in this figure are from salman2023raising and van2023anti.
  • ...and 1 more figures