Attacks and Defenses for Generative Diffusion Models: A Comprehensive Survey
Vu Tuan Truong, Luan Ba Dang, Long Bao Le
TL;DR
This survey addresses the security of diffusion models by cataloging three main attack families—backdoor, adversarial, and membership inference—and surveying corresponding defenses. It surveys five diffusion-model families (DDPM, DDIM, NCSN, score-based SDE, and multi-modal conditional DMs) and provides a unified view of forward/noise and reverse/denoising mechanics, training objectives, and sampling. The work aggregates state-of-the-art attacks (e.g., TrojDiff, BadDiffusion, RickRolling, MFA) and defenses (safety filters, unlearning, differential privacy, and knowledge distillation), and outlines open challenges including cross-modal backdoors, prompt-based adversaries, and privacy concerns in large, publicly released models. Overall, the paper highlights the societal risks of released DMs and charts directions for robust, privacy-preserving diffusion-based systems. The findings underscore the need to extend security research beyond vision-language to audio, time-series, and 3D domains, while developing scalable, effective defenses that balance utility and safety.
Abstract
Diffusion models (DMs) have achieved state-of-the-art performance on various generative tasks such as image synthesis, text-to-image, and text-guided image-to-image generation. However, the more powerful the DMs, the more harmful they potentially are. Recent studies have shown that DMs are prone to a wide range of attacks, including adversarial attacks, membership inference, backdoor injection, and various multi-modal threats. Since numerous pre-trained DMs are published widely on the Internet, potential threats from these attacks are especially detrimental to the society, making DM-related security a worth investigating topic. Therefore, in this paper, we conduct a comprehensive survey on the security aspect of DMs, focusing on various attack and defense methods for DMs. First, we present crucial knowledge of DMs with five main types of DMs, including denoising diffusion probabilistic models, denoising diffusion implicit models, noise conditioned score networks, stochastic differential equations, and multi-modal conditional DMs. We further survey a variety of recent studies investigating different types of attacks that exploit the vulnerabilities of DMs. Then, we thoroughly review potential countermeasures to mitigate each of the presented threats. Finally, we discuss open challenges of DM-related security and envision certain research directions for this topic.
