LoLiSRFlow: Joint Single Image Low-light Enhancement and Super-resolution via Cross-scale Transformer-based Conditional Flow
Ziyu Yue, Jiaxin Gao, Sihan Xie, Yang Liu, Zhixun Su
TL;DR
LoLiSRFlow tackles the challenging problem of jointly enhancing illumination and increasing resolution by learning the conditional distribution $P_{oldsymbol{y}|oldsymbol{x}}$ with a normalizing-flow framework conditioned on the degraded input $oldsymbol{x}$. It introduces a multi-resolution parallel transformer-based conditional encoder and a 3-scale invertible flow, augmented by a Retinex-inspired color-ratio prior $CR(I)$ to preserve color consistency across scales and illumination levels. The paper also presents the DFSR-LLE dataset with $7100$ synthetic pairs across 2x and 4x scales and realistic noise models. Experiments on both synthetic and real data demonstrate that LoLiSRFlow achieves superior restoration quality and robustness to noise and artifacts compared with state-of-the-art methods, highlighting its practical value for real-world low-light imaging scenarios.
Abstract
The visibility of real-world images is often limited by both low-light and low-resolution, however, these issues are only addressed in the literature through Low-Light Enhancement (LLE) and Super- Resolution (SR) methods. Admittedly, a simple cascade of these approaches cannot work harmoniously to cope well with the highly ill-posed problem for simultaneously enhancing visibility and resolution. In this paper, we propose a normalizing flow network, dubbed LoLiSRFLow, specifically designed to consider the degradation mechanism inherent in joint LLE and SR. To break the bonds of the one-to-many mapping for low-light low-resolution images to normal-light high-resolution images, LoLiSRFLow directly learns the conditional probability distribution over a variety of feasible solutions for high-resolution well-exposed images. Specifically, a multi-resolution parallel transformer acts as a conditional encoder that extracts the Retinex-induced resolution-and-illumination invariant map as the previous one. And the invertible network maps the distribution of usually exposed high-resolution images to a latent distribution. The backward inference is equivalent to introducing an additional constrained loss for the normal training route, thus enabling the manifold of the natural exposure of the high-resolution image to be immaculately depicted. We also propose a synthetic dataset modeling the realistic low-light low-resolution degradation, named DFSR-LLE, containing 7100 low-resolution dark-light/high-resolution normal sharp pairs. Quantitative and qualitative experimental results demonstrate the effectiveness of our method on both the proposed synthetic and real datasets.
