Table of Contents
Fetching ...

Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement

Aupendu Kar, Sobhan K. Dhara, Debashis Sen, Prabir K. Biswas

TL;DR

This work tackles low-light image enhancement without paired data by introducing SelfEnNet, a two-branch network consisting of an enhancement module $\mathcal{F_E}$ and a noise-handling module $\mathcal{F_D}$. It leverages self-supervision through controlled image transformations and unpaired self-conditioning to learn per-pixel enhancement maps, while a low-gradient magnitude based denoising pathway preserves details in low-light regions. The method achieves strong quantitative and subjective performance against state-of-the-art methods, particularly among unpaired approaches, and demonstrates robustness across datasets and training conditions. A key limitation is that outputs may be slightly less vibrant, reflecting a focus on sufficient and consistent enhancement over maximal color vividness.

Abstract

Real-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required degrees of enhancement at the input image pixels, which are learned from the unpaired low-lit and well-lit images without any direct supervision. The self-supervision is based on a controlled transformation of the input image and subsequent maintenance of its enhancement in spite of the transformation. The self-conditioning performs training of the model on unpaired images such that it does not enhance an already-enhanced image or a well-lit input image. The inherent noise in the input low-light images is handled by employing low gradient magnitude suppression in a detail-preserving manner. In addition, our noise handling is self-conditioned by preventing the denoising of noise-free well-lit images. The training based on low-light image enhancement-specific attributes allows our model to avoid paired supervision without compromising significantly in performance. While our proposed self-supervision aids consistent enhancement, our novel self-conditioning facilitates adequate enhancement. Extensive experiments on multiple standard datasets demonstrate that our model, in general, outperforms the state-of-the-art both quantitatively and subjectively. Ablation studies show the effectiveness of our self-supervision and self-conditioning strategies, and the related loss functions.

Self-supervision via Controlled Transformation and Unpaired Self-conditioning for Low-light Image Enhancement

TL;DR

This work tackles low-light image enhancement without paired data by introducing SelfEnNet, a two-branch network consisting of an enhancement module and a noise-handling module . It leverages self-supervision through controlled image transformations and unpaired self-conditioning to learn per-pixel enhancement maps, while a low-gradient magnitude based denoising pathway preserves details in low-light regions. The method achieves strong quantitative and subjective performance against state-of-the-art methods, particularly among unpaired approaches, and demonstrates robustness across datasets and training conditions. A key limitation is that outputs may be slightly less vibrant, reflecting a focus on sufficient and consistent enhancement over maximal color vividness.

Abstract

Real-world low-light images captured by imaging devices suffer from poor visibility and require a domain-specific enhancement to produce artifact-free outputs that reveal details. In this paper, we propose an unpaired low-light image enhancement network leveraging novel controlled transformation-based self-supervision and unpaired self-conditioning strategies. The model determines the required degrees of enhancement at the input image pixels, which are learned from the unpaired low-lit and well-lit images without any direct supervision. The self-supervision is based on a controlled transformation of the input image and subsequent maintenance of its enhancement in spite of the transformation. The self-conditioning performs training of the model on unpaired images such that it does not enhance an already-enhanced image or a well-lit input image. The inherent noise in the input low-light images is handled by employing low gradient magnitude suppression in a detail-preserving manner. In addition, our noise handling is self-conditioned by preventing the denoising of noise-free well-lit images. The training based on low-light image enhancement-specific attributes allows our model to avoid paired supervision without compromising significantly in performance. While our proposed self-supervision aids consistent enhancement, our novel self-conditioning facilitates adequate enhancement. Extensive experiments on multiple standard datasets demonstrate that our model, in general, outperforms the state-of-the-art both quantitatively and subjectively. Ablation studies show the effectiveness of our self-supervision and self-conditioning strategies, and the related loss functions.

Paper Structure

This paper contains 46 sections, 25 equations, 5 figures, 11 tables.

Figures (5)

  • Figure 1: The proposed low-light image enhancement approach, and its unpaired training and testing models. While $\boldsymbol{I}$ and $\boldsymbol{W}$ respectively denote the unpaired input low-light and well-lit images, $\boldsymbol{I_D}$ and $\boldsymbol{W_D}$ respectively denote the outputs of noise-handling on $\boldsymbol{I}$ and $\boldsymbol{W}$. $\boldsymbol{I}^{\alpha}$ denotes the partially enhanced image after the controlled transformation using $\alpha$, $\boldsymbol{I_D}^{\eta}$ denotes the enhanced image, and $\boldsymbol{I_D}^{\eta_I}$ represents the output noise-suppressed enhanced image. An $\eta_x$ with the subscript $x$ represents the enhancement map estimated from the input $x$ and $U$ stands for the all-ones matrix. $\mathcal{F_E}$ represents the enhancement module trained using the self-supervision loss $\mathcal{L_{SS}}$ and the self-conditioning losses $\mathcal{L_{SC}}$ and $\mathcal{L_{WSC}}$. $\mathcal{F_D}$ represents the noise-handling module trained using the self-conditioning loss $\mathcal{L_{DSC}}$, low-gradient magnitude suppression loss $\mathcal{L_G}$ and fidelity loss $\mathcal{L_{F}}$.
  • Figure 2: The detailed architecture of our low-light image enhancement network with the enhancement $\mathcal{F_E}$ and noise-handling $\mathcal{F_D}$ modules.
  • Figure 3: Enhancement of (b) low-light images from different datasets and their corresponding (a) ground truths. $*$ denotes a technique with paired supervision, $+$ denotes a technique without paired and unpaired supervision and $\S$ denotes a technique only with unpaired supervision. The first four images are real-world low-light images whose ground truths are not available.
  • Figure 4: The effect of the different loss functions related to our noise-handling module. The figure shows how PSNR values on test images vary for the trained models after each epoch.
  • Figure 5: Enhancement achieved on (a) a low-light image using SelfEnNet with (b) the enhancement module alone and (c) both the enhancement and noise-handling modules.