Illuminating Darkness: Learning to Enhance Low-light Images In-the-Wild
S M A Sharif, Abdur Rehman, Zain Ul Abidin, Fayaz Ali Dharejo, Radu Timofte, Rizwan Ali Naqvi
TL;DR
This work tackles real-world single-shot low-light image enhancement by addressing dataset scarcity with the Large-scale Low-Light Smartphone Dataset (LSD) and by proposing TFFormer, a luminance-chrominance decoupled Transformer-CNN architecture. TFFormer uses LCCAB, LCGRB, and LCCG to achieve perceptually faithful, structurally consistent enhancements while maintaining efficiency. The LSD dataset includes 6,425 aligned 4K+ image pairs and an unpaired LSD-U benchmark to test generalization, with a large-scale pipeline for patch-based training refinement. Across quantitative benchmarks and perceptual studies, TFFormer achieves state-of-the-art results on LSD and significantly improves downstream vision tasks such as object detection on ExDark, demonstrating robust generalization across devices and real-world conditions. The work provides a valuable dataset resource and a principled LC-aware model that advances practical SLLIE for real-world applications.
Abstract
Single-shot low-light image enhancement (SLLIE) remains challenging due to the limited availability of diverse, real-world paired datasets. To bridge this gap, we introduce the Low-Light Smartphone Dataset (LSD), a large-scale, high-resolution (4K+) dataset collected in the wild across a wide range of challenging lighting conditions (0.1 to 200 lux). LSD contains 6,425 precisely aligned low and normal-light image pairs, selected from over 8,000 dynamic indoor and outdoor scenes through multi-frame acquisition and expert evaluation. To evaluate generalization and aesthetic quality, we collect 2,117 unpaired low-light images from previously unseen devices. To fully exploit LSD, we propose TFFormer, a hybrid model that encodes luminance and chrominance (LC) separately to reduce color-structure entanglement. We further propose a cross-attention-driven joint decoder for context-aware fusion of LC representations, along with LC refinement and LC-guided supervision to significantly enhance perceptual fidelity and structural consistency. TFFormer achieves state-of-the-art results on LSD (+2.45 dB PSNR) and substantially improves downstream vision tasks, such as low-light object detection (+6.80 mAP on ExDark).
