Table of Contents
Fetching ...

LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow

Ziyu Yue, Jiaxin Gao, Sihan Xie, Yang Liu, Zhixun Su

TL;DR

LoLiSRFlow tackles the challenging problem of jointly enhancing illumination and increasing resolution by learning the conditional distribution $P_{oldsymbol{y}|oldsymbol{x}}$ with a normalizing-flow framework conditioned on the degraded input $oldsymbol{x}$. It introduces a multi-resolution parallel transformer-based conditional encoder and a 3-scale invertible flow, augmented by a Retinex-inspired color-ratio prior $CR(I)$ to preserve color consistency across scales and illumination levels. The paper also presents the DFSR-LLE dataset with $7100$ synthetic pairs across 2x and 4x scales and realistic noise models. Experiments on both synthetic and real data demonstrate that LoLiSRFlow achieves superior restoration quality and robustness to noise and artifacts compared with state-of-the-art methods, highlighting its practical value for real-world low-light imaging scenarios.

Abstract

The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. In this paper, we propose a normalizing flow network, dubbed LoLiSRFLow, specifically designed to consider the degradation mechanism inherent in joint LLE and SR. To break the bonds of the one-to-many mapping for low-light low-resolution images to normal-light high-resolution images, LoLiSRFLow directly learns the conditional probability distribution over a variety of feasible solutions for high-resolution well-exposed images. Specifically, a multi-resolution parallel transformer acts as a conditional encoder that extracts the Retinex-induced resolution-and-illumination invariant map as the previous one. And the invertible network maps the distribution of usually exposed high-resolution images to a latent distribution. The backward inference is equivalent to introducing an additional constrained loss for the normal training route, thus enabling the manifold of the natural exposure of the high-resolution image to be immaculately depicted. We also propose a synthetic dataset modeling the realistic low-light low-resolution degradation, named DFSR-LLE, containing 7100 low-resolution dark-light/high-resolution normal sharp pairs. Quantitative and qualitative experimental results demonstrate the effectiveness of our method on both the proposed synthetic and real datasets.

LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow

TL;DR

LoLiSRFlow tackles the challenging problem of jointly enhancing illumination and increasing resolution by learning the conditional distribution with a normalizing-flow framework conditioned on the degraded input . It introduces a multi-resolution parallel transformer-based conditional encoder and a 3-scale invertible flow, augmented by a Retinex-inspired color-ratio prior to preserve color consistency across scales and illumination levels. The paper also presents the DFSR-LLE dataset with synthetic pairs across 2x and 4x scales and realistic noise models. Experiments on both synthetic and real data demonstrate that LoLiSRFlow achieves superior restoration quality and robustness to noise and artifacts compared with state-of-the-art methods, highlighting its practical value for real-world low-light imaging scenarios.

Abstract

The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. In this paper, we propose a normalizing flow network, dubbed LoLiSRFLow, specifically designed to consider the degradation mechanism inherent in joint LLE and SR. To break the bonds of the one-to-many mapping for low-light low-resolution images to normal-light high-resolution images, LoLiSRFLow directly learns the conditional probability distribution over a variety of feasible solutions for high-resolution well-exposed images. Specifically, a multi-resolution parallel transformer acts as a conditional encoder that extracts the Retinex-induced resolution-and-illumination invariant map as the previous one. And the invertible network maps the distribution of usually exposed high-resolution images to a latent distribution. The backward inference is equivalent to introducing an additional constrained loss for the normal training route, thus enabling the manifold of the natural exposure of the high-resolution image to be immaculately depicted. We also propose a synthetic dataset modeling the realistic low-light low-resolution degradation, named DFSR-LLE, containing 7100 low-resolution dark-light/high-resolution normal sharp pairs. Quantitative and qualitative experimental results demonstrate the effectiveness of our method on both the proposed synthetic and real datasets.
Paper Structure (14 sections, 5 equations, 11 figures, 4 tables)

This paper contains 14 sections, 5 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: An illustration of the proposed LoLiSRFlow. Our model consists of a multi-resolution parallel conditional transformer (MPCT) to extract the cross-scale illumination-invariant reflection map and an normalizing flow based invertible network that learns a distribution of high-resolution normally exposed images conditioned on a low-resolution low-light one.
  • Figure 2: Visual results of color ratio map $CR(I)$ on the low-light low-resolution image with three darkness levels (i.e., $-2.5$ EV, $-3.5$ EV and $-4.5$ EV), and normal-light high-resolution image with two scales ($\times2$, $\times4$).
  • Figure 3: Simultaneous LLE and SR results of different algorithms and ours trained and tested on RELLISUR Dataset($\times2$). Our method does not over-brighten the image and does not add noise and artifacts. Zoom in for best view.
  • Figure 4: Simultaneous LLE and SR results of different algorithms and ours trained and tested on RELLISUR Dataset($\times2$). Our method maintains the texture details of an image in extremely dark environments without zooming in on the noise.
  • Figure 5: Simultaneous LLE and SR results of different algorithms and ours trained and tested on RELLISUR Dataset($\times4$). Our method does not produce severe artifacts and color deviations and is closer to ground truth.. Zoom in for best view.
  • ...and 6 more figures