Table of Contents
Fetching ...

A Comprehensive Survey on Diffusion Models and Their Applications

Md Manjurul Ahsan, Shivakumar Raman, Yingtao Liu, Zahed Siddique

TL;DR

This survey comprehensively consolidates diffusion models (DMs) across theory, algorithms, and multi-domain applications, detailing DDPMs, NCSNs, and SDEs as core families and highlighting broad use in image, audio, text, and healthcare tasks. It documents key innovations—such as guided synthesis, accelerated sampling, and conditional diffusion techniques—and surveys numerous applications from image enhancement to medical imaging, while noting limitations in efficiency, scalability, and safety. The authors synthesize findings to identify open challenges, propose guidelines for future research, and advocate for interdisciplinary collaboration to broaden the impact of DMs responsibly. Overall, the work clarifies the state-of-the-art, outlines practical considerations for deployment, and emphasizes ethical governance as diffusion models move toward widespread, cross-domain adoption.

Abstract

Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech synthesis, and natural language processing due to their ability to produce high-quality samples. As Diffusion Models are being adopted in various domains, existing literature reviews that often focus on specific areas like computer vision or medical imaging may not serve a broader audience across multiple fields. Therefore, this review presents a comprehensive overview of Diffusion Models, covering their theoretical foundations and algorithmic innovations. We highlight their applications in diverse areas such as media quality, authenticity, synthesis, image transformation, healthcare, and more. By consolidating current knowledge and identifying emerging trends, this review aims to facilitate a deeper understanding and broader adoption of Diffusion Models and provide guidelines for future researchers and practitioners across diverse disciplines.

A Comprehensive Survey on Diffusion Models and Their Applications

TL;DR

This survey comprehensively consolidates diffusion models (DMs) across theory, algorithms, and multi-domain applications, detailing DDPMs, NCSNs, and SDEs as core families and highlighting broad use in image, audio, text, and healthcare tasks. It documents key innovations—such as guided synthesis, accelerated sampling, and conditional diffusion techniques—and surveys numerous applications from image enhancement to medical imaging, while noting limitations in efficiency, scalability, and safety. The authors synthesize findings to identify open challenges, propose guidelines for future research, and advocate for interdisciplinary collaboration to broaden the impact of DMs responsibly. Overall, the work clarifies the state-of-the-art, outlines practical considerations for deployment, and emphasizes ethical governance as diffusion models move toward widespread, cross-domain adoption.

Abstract

Diffusion Models are probabilistic models that create realistic samples by simulating the diffusion process, gradually adding and removing noise from data. These models have gained popularity in domains such as image processing, speech synthesis, and natural language processing due to their ability to produce high-quality samples. As Diffusion Models are being adopted in various domains, existing literature reviews that often focus on specific areas like computer vision or medical imaging may not serve a broader audience across multiple fields. Therefore, this review presents a comprehensive overview of Diffusion Models, covering their theoretical foundations and algorithmic innovations. We highlight their applications in diverse areas such as media quality, authenticity, synthesis, image transformation, healthcare, and more. By consolidating current knowledge and identifying emerging trends, this review aims to facilitate a deeper understanding and broader adoption of Diffusion Models and provide guidelines for future researchers and practitioners across diverse disciplines.
Paper Structure (24 sections, 11 equations, 11 figures)

This paper contains 24 sections, 11 equations, 11 figures.

Figures (11)

  • Figure 1: An example of Diffusion-based models. From the figure, it can be observed that the model uses cross-attention mechanisms to enhance image synthesis. This approach allows the model to integrate different types of input information, such as text or semantic maps, to control the image generation process more effectively. The figure shows how these inputs are processed and incorporated into the model to produce high-quality images rombach2022high.
  • Figure 2: Statistics on (a) the number of papers published over the last five years in DMs and (b) the percentage of published papers across various domains.
  • Figure 3: Comprehensive overview of DMs: This diagram categorizes various DMs and their applications across different fields. DMF -- Diffusion Models framework, TDM -- Types of Diffusion Models, IET -- Image Enhancement and Transformation, MQAS -- Media Quality, Authenticity, and Synthesis, DDPMs -- Denoising Diffusion Probabilistic Models, NCSNs -- Noise-Conditioned Score Networks, SDEs -- Stochastic Differential Equations.
  • Figure 4: Timeline of different DMs from 2010 to 2023. The three main DMs, such as NCSNs, DDPMs, and SDEs, are highlighted with different colors.
  • Figure 5: Mao et al. (2023) explored how the initial image influenced the image generation process and proposed a new method to control it by altering the initial random noise. They demonstrated two applications: layout-to-image synthesis, which created objects in specified locations, and re-painting, which allowed users to change specific portions of an image while keeping the rest unchanged mao2023guided.
  • ...and 6 more figures