Table of Contents
Fetching ...

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

Hai Jiang, Ao Luo, Xiaohong Liu, Songchen Han, Shuaicheng Liu

TL;DR

This work tackles the ill-posed problem of low-light image enhancement without paired data by marrying Retinex theory with diffusion models in latent space. It introduces a Content-Transfer Decomposition Network to separate latent features into content-rich reflectance and content-free illumination, and a Latent-Retinex Diffusion Model guided by low-light features to restore images, supplemented by a self-constrained consistency loss to prevent content leakage. The method is trained in two stages on unpaired data and shows superior performance to unsupervised baselines on standard benchmarks and strong generalization to unseen scenes, including improvements in low-light face detection. The approach delivers practical, generalizable LLIE with an open-source implementation, enabling broader application in real-world imaging tasks.

Abstract

In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion.

LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models

TL;DR

This work tackles the ill-posed problem of low-light image enhancement without paired data by marrying Retinex theory with diffusion models in latent space. It introduces a Content-Transfer Decomposition Network to separate latent features into content-rich reflectance and content-free illumination, and a Latent-Retinex Diffusion Model guided by low-light features to restore images, supplemented by a self-constrained consistency loss to prevent content leakage. The method is trained in two stages on unpaired data and shows superior performance to unsupervised baselines on standard benchmarks and strong generalization to unseen scenes, including improvements in low-light face detection. The approach delivers practical, generalizable LLIE with an open-source implementation, enabling broader application in real-world imaging tasks.

Abstract

In this paper, we propose a diffusion-based unsupervised framework that incorporates physically explainable Retinex theory with diffusion models for low-light image enhancement, named LightenDiffusion. Specifically, we present a content-transfer decomposition network that performs Retinex decomposition within the latent space instead of image space as in previous approaches, enabling the encoded features of unpaired low-light and normal-light images to be decomposed into content-rich reflectance maps and content-free illumination maps. Subsequently, the reflectance map of the low-light image and the illumination map of the normal-light image are taken as input to the diffusion model for unsupervised restoration with the guidance of the low-light feature, where a self-constrained consistency loss is further proposed to eliminate the interference of normal-light content on the restored results to improve overall visual quality. Extensive experiments on publicly available real-world benchmarks show that the proposed LightenDiffusion outperforms state-of-the-art unsupervised competitors and is comparable to supervised methods while being more generalizable to various scenes. Our code is available at https://github.com/JianghaiSCU/LightenDiffusion.
Paper Structure (15 sections, 9 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Visual comparisons of our method with recent state-of-the-art supervised and unsupervised LLIE methods UReinexNet Uretinex-net, SMG SMG, NeRCo NeRCo, and GDP GDP. Previous methods appear incorrect exposure, color distortion, blurred details, or noise amplification to degrade visual quality, while our method properly improves global and local contrast, presents a vivid color, and avoids introducing artifacts.
  • Figure 2: The overall pipeline of our proposed framework. We first employ an encoder $\mathcal{E}(\cdot)$ to convert the unpaired low-light image $I_{low}$ and normal-light image $I_{high}$ into latent space denoted as $\mathcal{F}_{low}$ and $\mathcal{F}_{high}$. The encoded features are sent to the proposed content-transfer decomposition network (CTDN) to generate content-rich reflectance maps denoted as $\mathbf{R}_{low}$ and $\mathbf{R}_{high}$ and content-free illumination maps as $\mathbf{L}_{low}$ and $\mathbf{L}_{high}$. Then, the reflectance map of the low-light image $\mathbf{R}_{low}$ and the illumination of the normal-light image $\mathbf{L}_{high}$ are taken as the input of the diffusion model to perform the forward diffusion process. Finally, we perform the reverse denoising process to gradually transform the randomly sampled Gaussian noise $\hat{\mathbf{x}}_{T}$ into the restored feature $\mathcal{\hat{F}}_{low}$ with the guidance of the low-light feature $\mathcal{F}_{low}$ denoted as $\tilde{\mathbf{x}}$, and subsequently send it to a decoder $\mathcal{D}(\cdot)$ to produce the final result $\hat{I}_{low}$.
  • Figure 3: Illustration of the decomposition results obtained by different methods. (a) shows the results of previous methods, i.e., RetinexNet RetinexNet, KinD++ KinD++, URetinexNet Uretinex-net, and PairLIE PairLIE, that perform decomposition in image space. (b) presents the results of our CTDN that performs decomposition in latent space. Our method can generate content-rich reflectance maps and content-free illumination maps.
  • Figure 4: The detailed architecture of our proposed CTDN.
  • Figure 5: Qualitative comparison of our method and competitive methods on the LOL RetinexNet and LSRW R2RNet test sets. Best viewed by zooming in.
  • ...and 4 more figures