Table of Contents
Fetching ...

Diffusion-Based Low-Light Image Enhancement with Color and Luminance Priors

Xuanshuo Fu, Lei Kang, Javier Vazquez-Corral

TL;DR

The proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors in low-light image enhancement, achieving state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks.

Abstract

Low-light images often suffer from low contrast, noise, and color distortion, degrading visual quality and impairing downstream vision tasks. We propose a novel conditional diffusion framework for low-light image enhancement that incorporates a Structured Control Embedding Module (SCEM). SCEM decomposes a low-light image into four informative components including illumination, illumination-invariant features, shadow priors, and color-invariant cues. These components serve as control signals that condition a U-Net-based diffusion model trained with a simplified noise-prediction loss. Thus, the proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors. In experiments, our model is trained only on the LOLv1 dataset and evaluated without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME. The method achieves state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks. https://casted.github.io/scem/.

Diffusion-Based Low-Light Image Enhancement with Color and Luminance Priors

TL;DR

The proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors in low-light image enhancement, achieving state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks.

Abstract

Low-light images often suffer from low contrast, noise, and color distortion, degrading visual quality and impairing downstream vision tasks. We propose a novel conditional diffusion framework for low-light image enhancement that incorporates a Structured Control Embedding Module (SCEM). SCEM decomposes a low-light image into four informative components including illumination, illumination-invariant features, shadow priors, and color-invariant cues. These components serve as control signals that condition a U-Net-based diffusion model trained with a simplified noise-prediction loss. Thus, the proposed SCEM equipped Diffusion method enforces structured enhancement guided by physical priors. In experiments, our model is trained only on the LOLv1 dataset and evaluated without fine-tuning on LOLv2-real, LSRW, DICM, MEF, and LIME. The method achieves state-of-the-art performance in quantitative and perceptual metrics, demonstrating strong generalization across benchmarks. https://casted.github.io/scem/.
Paper Structure (11 sections, 22 equations, 4 figures, 3 tables)

This paper contains 11 sections, 22 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Quantitative comparisons with state-of-the-art methods. (a) presents numerical scores for PSNR, SSIM, LPIPS, and FID on 3 datasets: LOLv1, LOLv2-real, and LSRW, which contain groundtruth normal-light images. (b) presents numerical scores for NIQE, BRISQUE and PI on 3 datasets: DICM, MEF, and LIME, which contain only low-light images without groundtruth normal-light counterparts. Note that our proposed methods are trained only on the LOLv1 training set and are directly evaluated on the remaining datasets. To enable intuitive comparison in the radar plot, we normalized all metrics and inverted those where lower is better (e.g., LPIPS, FID), so that higher values consistently indicate better performance.
  • Figure 2: The proposed architecture of SCEM equiped diffusion model for low-light image enhancement. During training, we extract four types of features from the input low-light image $\mathcal{I}$: illumination $T_{ref}$, color invariance features $\Phi(x)$, illumination-invariant features $R_c$, and shadow priors $S_{3ch}$. These features, along with the original low-light image, are concatenated with the randomly chosen $t$-th noised image $X_t$ to form the input $X_t'$ for the denoising training process. The diffusion model then generates the enhanced image $\hat{X}_0$. During inference, Gaussian noise $X_T$ is concatenated with the same set of extracted features and the original low-light image $I$.
  • Figure 3: Visual comparisons of our approach with competing methods. The input image is from datasets LOLv1, LOLv2-real, LSRW for the first, second and third rows, respectively.
  • Figure 4: Ablation study comparing PSNR for different input configurations: (1) low-light image $\mathcal{I}$ only, (2) $\mathcal{I}$ with illumination $T_{ref}$, (3) $\mathcal{I}$ with color-invariant features $\Phi(x)$, (4) $\mathcal{I}$ with illumination-invariant features $R_c$, and (5) $\mathcal{I}$ with shadow priors $S_{3ch}$.