Table of Contents
Fetching ...

Illuminating Darkness: Learning to Enhance Low-light Images In-the-Wild

S M A Sharif, Abdur Rehman, Zain Ul Abidin, Fayaz Ali Dharejo, Radu Timofte, Rizwan Ali Naqvi

TL;DR

This work tackles real-world single-shot low-light image enhancement by addressing dataset scarcity with the Large-scale Low-Light Smartphone Dataset (LSD) and by proposing TFFormer, a luminance-chrominance decoupled Transformer-CNN architecture. TFFormer uses LCCAB, LCGRB, and LCCG to achieve perceptually faithful, structurally consistent enhancements while maintaining efficiency. The LSD dataset includes 6,425 aligned 4K+ image pairs and an unpaired LSD-U benchmark to test generalization, with a large-scale pipeline for patch-based training refinement. Across quantitative benchmarks and perceptual studies, TFFormer achieves state-of-the-art results on LSD and significantly improves downstream vision tasks such as object detection on ExDark, demonstrating robust generalization across devices and real-world conditions. The work provides a valuable dataset resource and a principled LC-aware model that advances practical SLLIE for real-world applications.

Abstract

Single-shot low-light image enhancement (SLLIE) remains challenging due to the limited availability of diverse, real-world paired datasets. To bridge this gap, we introduce the Low-Light Smartphone Dataset (LSD), a large-scale, high-resolution (4K+) dataset collected in the wild across a wide range of challenging lighting conditions (0.1 to 200 lux). LSD contains 6,425 precisely aligned low and normal-light image pairs, selected from over 8,000 dynamic indoor and outdoor scenes through multi-frame acquisition and expert evaluation. To evaluate generalization and aesthetic quality, we collect 2,117 unpaired low-light images from previously unseen devices. To fully exploit LSD, we propose TFFormer, a hybrid model that encodes luminance and chrominance (LC) separately to reduce color-structure entanglement. We further propose a cross-attention-driven joint decoder for context-aware fusion of LC representations, along with LC refinement and LC-guided supervision to significantly enhance perceptual fidelity and structural consistency. TFFormer achieves state-of-the-art results on LSD (+2.45 dB PSNR) and substantially improves downstream vision tasks, such as low-light object detection (+6.80 mAP on ExDark).

Illuminating Darkness: Learning to Enhance Low-light Images In-the-Wild

TL;DR

This work tackles real-world single-shot low-light image enhancement by addressing dataset scarcity with the Large-scale Low-Light Smartphone Dataset (LSD) and by proposing TFFormer, a luminance-chrominance decoupled Transformer-CNN architecture. TFFormer uses LCCAB, LCGRB, and LCCG to achieve perceptually faithful, structurally consistent enhancements while maintaining efficiency. The LSD dataset includes 6,425 aligned 4K+ image pairs and an unpaired LSD-U benchmark to test generalization, with a large-scale pipeline for patch-based training refinement. Across quantitative benchmarks and perceptual studies, TFFormer achieves state-of-the-art results on LSD and significantly improves downstream vision tasks such as object detection on ExDark, demonstrating robust generalization across devices and real-world conditions. The work provides a valuable dataset resource and a principled LC-aware model that advances practical SLLIE for real-world applications.

Abstract

Single-shot low-light image enhancement (SLLIE) remains challenging due to the limited availability of diverse, real-world paired datasets. To bridge this gap, we introduce the Low-Light Smartphone Dataset (LSD), a large-scale, high-resolution (4K+) dataset collected in the wild across a wide range of challenging lighting conditions (0.1 to 200 lux). LSD contains 6,425 precisely aligned low and normal-light image pairs, selected from over 8,000 dynamic indoor and outdoor scenes through multi-frame acquisition and expert evaluation. To evaluate generalization and aesthetic quality, we collect 2,117 unpaired low-light images from previously unseen devices. To fully exploit LSD, we propose TFFormer, a hybrid model that encodes luminance and chrominance (LC) separately to reduce color-structure entanglement. We further propose a cross-attention-driven joint decoder for context-aware fusion of LC representations, along with LC refinement and LC-guided supervision to significantly enhance perceptual fidelity and structural consistency. TFFormer achieves state-of-the-art results on LSD (+2.45 dB PSNR) and substantially improves downstream vision tasks, such as low-light object detection (+6.80 mAP on ExDark).

Paper Structure

This paper contains 47 sections, 14 equations, 28 figures, 13 tables.

Figures (28)

  • Figure 1: Qualitative comparison of real-world low-light enhancement using our proposed TFFormer trained on different datasets. From left to right: Input image, TFFormer trained on LOL-V1 wei2018deep, LOL-V2 yang2021sparse, LSRW hai2023r2rnet, NCLLIE liu2024ntire, and the proposed LSD dataset. While existing datasets lead to artifacts or noise amplification, LSD enables robust enhancement with preserved structure and color fidelity, demonstrating its superior diversity and realism.
  • Figure 2: Noise and intensity distribution comparison between LSD and prior SLLIE datasets. (a) light intensity variation. (b) noise variation.
  • Figure 3: Overview of the proposed LSD dataset: (a) collection and refinement pipeline, (b) Illumination distribution across indoor and outdoor scenes, (c) DLI and NLI sample pairs.
  • Figure 4: Distribution comparison between filtered and raw training images. (a) sharpness improvement. (b) intensity filtering
  • Figure 5: Overview of the proposed TFFormer architecture. (a), (b) LC mapping modules extract and boost luminance and chrominance features. (c), (d) Dedicated encoders process the respective LC features. (e) LC Cross-Attention Block (LCCAB) fuses luminance and chrominance representations. (f) A shared decoder reconstructs an intermediate output. (g) The LC Guided Refinement Block (LCGRB) further enhances the image using LC-aware attention.
  • ...and 23 more figures