Table of Contents
Fetching ...

Diffusion Models for Low-Light Image Enhancement: A Multi-Perspective Taxonomy and Performance Analysis

Eashan Adhikarla, Yixin Liu, Brian D. Davison

TL;DR

This survey addresses low-light image enhancement through diffusion models, proposing a six-perspective taxonomy (Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, Autonomous) that maps methods to physical priors, conditioning, and efficiency. It provides a comprehensive performance comparison against GAN and Transformer baselines, analyzes qualitative and quantitative failure modes, and discusses deployment constraints and ethical considerations. Key insights include an expanding efficiency-fidelity frontier driven by latent and spectral diffusion, a shift toward controllable and task-aware enhancement, and the rising relevance of foundation-model guidance for LLIE. The practical impact lies in guiding the next generation of diffusion-based LLIE toward real-time, on-device, and robust cross-domain applications while acknowledging data scarcity, interpretability, and responsible AI concerns.

Abstract

Low-light image enhancement (LLIE) is vital for safety-critical applications such as surveillance, autonomous navigation, and medical imaging, where visibility degradation can impair downstream task performance. Recently, diffusion models have emerged as a promising generative paradigm for LLIE due to their capacity to model complex image distributions via iterative denoising. This survey provides an up-to-date critical analysis of diffusion models for LLIE, distinctively featuring an in-depth comparative performance evaluation against Generative Adversarial Network and Transformer-based state-of-the-art methods, a thorough examination of practical deployment challenges, and a forward-looking perspective on the role of emerging paradigms like foundation models. We propose a multi-perspective taxonomy encompassing six categories: Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, and Autonomous; that map enhancement methods across physical priors, conditioning schemes, and computational efficiency. Our taxonomy is grounded in a hybrid view of both the model mechanism and the conditioning signals. We evaluate qualitative failure modes, benchmark inconsistencies, and trade-offs between interpretability, generalization, and inference efficiency. We also discuss real-world deployment constraints (e.g., memory, energy use) and ethical considerations. This survey aims to guide the next generation of diffusion-based LLIE research by highlighting trends and surfacing open research questions, including novel conditioning, real-time adaptation, and the potential of foundation models.

Diffusion Models for Low-Light Image Enhancement: A Multi-Perspective Taxonomy and Performance Analysis

TL;DR

This survey addresses low-light image enhancement through diffusion models, proposing a six-perspective taxonomy (Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, Autonomous) that maps methods to physical priors, conditioning, and efficiency. It provides a comprehensive performance comparison against GAN and Transformer baselines, analyzes qualitative and quantitative failure modes, and discusses deployment constraints and ethical considerations. Key insights include an expanding efficiency-fidelity frontier driven by latent and spectral diffusion, a shift toward controllable and task-aware enhancement, and the rising relevance of foundation-model guidance for LLIE. The practical impact lies in guiding the next generation of diffusion-based LLIE toward real-time, on-device, and robust cross-domain applications while acknowledging data scarcity, interpretability, and responsible AI concerns.

Abstract

Low-light image enhancement (LLIE) is vital for safety-critical applications such as surveillance, autonomous navigation, and medical imaging, where visibility degradation can impair downstream task performance. Recently, diffusion models have emerged as a promising generative paradigm for LLIE due to their capacity to model complex image distributions via iterative denoising. This survey provides an up-to-date critical analysis of diffusion models for LLIE, distinctively featuring an in-depth comparative performance evaluation against Generative Adversarial Network and Transformer-based state-of-the-art methods, a thorough examination of practical deployment challenges, and a forward-looking perspective on the role of emerging paradigms like foundation models. We propose a multi-perspective taxonomy encompassing six categories: Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, and Autonomous; that map enhancement methods across physical priors, conditioning schemes, and computational efficiency. Our taxonomy is grounded in a hybrid view of both the model mechanism and the conditioning signals. We evaluate qualitative failure modes, benchmark inconsistencies, and trade-offs between interpretability, generalization, and inference efficiency. We also discuss real-world deployment constraints (e.g., memory, energy use) and ethical considerations. This survey aims to guide the next generation of diffusion-based LLIE research by highlighting trends and surfacing open research questions, including novel conditioning, real-time adaptation, and the potential of foundation models.

Paper Structure

This paper contains 59 sections, 13 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Categorical representation of widely used diffusion methods in low-light into six different categories (Best viewed in color.)
  • Figure 2: Expanding feasible frontier in LLIE. Axes: quality$\uparrow$, diversity$\uparrow$; interior = lower latency. Envelopes for VAEs/GANs (2017--2020, blue), classic diffusion (2021--2023, orange), and few/one-step/consistency diffusion (2023--2025, green) illustrate a monotonic expansion.
  • Figure 3: Illustration of the diffusion training and inference pipeline. Top: The forward process gradually perturbs the clean image $x_0$ into noisy representations $x_t$ by adding Gaussian noise across timesteps. Bottom: The reverse process starts from white noise $x_T \sim \mathcal{N}(0,I)$ and employs a U-Net denoiser $\epsilon_\theta(x_t, t)$ to iteratively recover the clean image. Training optimizes the noise prediction loss $\mathcal{L}_{\text{simple}}$, while inference repeatedly applies the reverse step to reconstruct $x_0$.
  • Figure 4: A detailed hierarchical view of the proposed taxonomy for diffusion-based Low-Light Image Enhancement (LLIE) methods. The six primary categories (Intrinsic Decomposition, Spectral & Latent, Accelerated, Guided, Multimodal, and Autonomous) are shown at the top level. Each category is further broken down into sub-categories representing specific strategic approaches or technical innovations within that domain.
  • Figure 5: [Top] The Diff-Retinex paper diff-retinex outlines a modular framework comprising three detachable components: the Transformer Decomposition Network (TDN), Reflectance Diffusion Adjustment (RDA), and Illumination Diffusion Adjustment (IDA). [Bottom] As presented in the LightenDiffusion lightendiffusion paper, the pipeline uses an encoder $E(\cdot)$ to map unpaired low/normal-light images ($I_{low}$, $I_{high}$) to latent features ($F_{low}$, $F_{high}$), which the CTDN splits into reflectance ($R$) and illumination ($L$) maps. $R_{low}$ and $L_{high}$ drive the forward diffusion, while the reverse denoising process transforms noise $\hat{x}T$ into restored features $\hat{F}{low}$ guided by $F_{low}$ ($\tilde{x}$), then decoded by $D(\cdot)$ into the final image $\hat{I}_{low}$.
  • ...and 8 more figures

Theorems & Definitions (4)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4: Conditional Diffusion