Table of Contents
Fetching ...

Diffusion Models in Low-Level Vision: A Survey

Chunming He, Yuqi Shen, Chengyu Fang, Fengyang Xiao, Longxiang Tang, Yulun Zhang, Wangmeng Zuo, Zhenhua Guo, Xiu Li

TL;DR

This survey comprehensively maps diffusion-model techniques across low-level vision, detailing three foundational frameworks (DDPMs, NCSNs, SDEs) and their connections to other generative approaches. It surveys DM applications in natural images, medical imaging, remote sensing, and video, highlighting training paradigms, task-specific designs, and performance benchmarks. The authors synthesize experimental results, discuss limitations, and propose future directions to improve efficiency, generalization, and multi-modal integration. The work aims to provide a unified reference that accelerates understanding and development of diffusion-model-based solutions for complex, real-world low-level vision tasks.

Abstract

Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision.

Diffusion Models in Low-Level Vision: A Survey

TL;DR

This survey comprehensively maps diffusion-model techniques across low-level vision, detailing three foundational frameworks (DDPMs, NCSNs, SDEs) and their connections to other generative approaches. It surveys DM applications in natural images, medical imaging, remote sensing, and video, highlighting training paradigms, task-specific designs, and performance benchmarks. The authors synthesize experimental results, discuss limitations, and propose future directions to improve efficiency, generalization, and multi-modal integration. The work aims to provide a unified reference that accelerates understanding and development of diffusion-model-based solutions for complex, real-world low-level vision tasks.

Abstract

Deep generative models have garnered significant attention in low-level vision tasks due to their generative capabilities. Among them, diffusion model-based solutions, characterized by a forward diffusion process and a reverse denoising process, have emerged as widely acclaimed for their ability to produce samples of superior quality and diversity. This ensures the generation of visually compelling results with intricate texture information. Despite their remarkable success, a noticeable gap exists in a comprehensive survey that amalgamates these pioneering diffusion model-based works and organizes the corresponding threads. This paper proposes the comprehensive review of diffusion model-based techniques. We present three generic diffusion modeling frameworks and explore their correlations with other deep generative models, establishing the theoretical foundation. Following this, we introduce a multi-perspective categorization of diffusion models, considering both the underlying framework and the target task. Additionally, we summarize extended diffusion models applied in other tasks, including medical, remote sensing, and video scenarios. Moreover, we provide an overview of commonly used benchmarks and evaluation metrics. We conduct a thorough evaluation, encompassing both performance and efficiency, of diffusion model-based techniques in three prominent tasks. Finally, we elucidate the limitations of current diffusion models and propose seven intriguing directions for future research. This comprehensive examination aims to facilitate a profound understanding of the landscape surrounding denoising diffusion models in the context of low-level vision tasks. A curated list of diffusion model-based techniques in over 20 low-level vision tasks can be found at https://github.com/ChunmingHe/awesome-diffusion-models-in-low-level-vision.
Paper Structure (24 sections, 16 equations, 19 figures, 7 tables)

This paper contains 24 sections, 16 equations, 19 figures, 7 tables.

Figures (19)

  • Figure 1: Examples of various low-level vision tasks with the low-quality image (left) and the enhanced high-quality image (right). Notice that all the enhanced results are generated with diffusion model-based algorithms, which are IDM sp-idm in (a), MSGD ren2023multiscale in (b), Repaint sp-repaint in (c), Reti-Diff he2023reti-LLIE3 in (d), DOLCE sp-LACT-liu2023dolce in (e), and DDPM-CR p-RS-cloudremoval-jing2023denoising in (f).
  • Figure 2: Distributions of the four main low-level vision scenarios of DM-based models. In each Venn diagram, the overlapping regions between circles indicate that these models can address multiple application tasks or input modalities.
  • Figure 3: The bar chart illustrates the continuous growth of DM-based methods in low-level vision tasks across four distinct scenarios. Representative works are categorized and marked on the line graph with colors corresponding to each scenario as indicated in the legend. The methods highlighted represent the seminal works of each period, e.g., StableSR p-crossmodal-accelerate3-wang2023exploiting has garnered 1.9k GitHub stars, SR3 sp-SR3 boasts 1.2k citations, and SUPIR yu2024scalingsupir is a pioneering DM-based multi-modal solution.
  • Figure 4: The schematic diagram of diffusion models.
  • Figure 5: The flowcharts of generative models, where the HQ image $\tilde{x}$ is generated by the corresponding methods, i.e., LACR-VAE wu2024light-LACR-VAE, LLFlow wang2022LLFlow, Vanilla GAN P-GAN, PyDiff zhou2023pyramid-LLIE2.
  • ...and 14 more figures