Table of Contents
Fetching ...

Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design

Wei Dong, Yan Min, Han Zhou, Jun Chen

TL;DR

This work addresses the ill posed nature of low-light image enhancement by introducing SG-LLIE, a scale-aware CNN-Transformer framework guided by robust structure priors derived from illumination-invariant edges. The model integrates a Hybrid Structure-Guided Feature Extractor with Structure-Guided Transformer Blocks and cross attention, complemented by a Scale-Adaptive Module for multi-scale fusion, and employs a multi-term loss with an adaptive illumination layer. Key contributions include a principled structure priors extraction via CIConv, a novel SGTB and SGCA for structure guided restoration, and extensive evaluation showing state-of-the-art PSNR and competitive SSIM across NTIRE 2025 LLIE Challenge and LLIE benchmarks. The approach yields robust, detail-preserving LLIE on high-resolution images, with practical impact for real-world low-light imaging tasks and potential applicability to other ill posed restoration problems.

Abstract

Current Low-light Image Enhancement (LLIE) techniques predominantly rely on either direct Low-Light (LL) to Normal-Light (NL) mappings or guidance from semantic features or illumination maps. Nonetheless, the intrinsic ill-posedness of LLIE and the difficulty in retrieving robust semantics from heavily corrupted images hinder their effectiveness in extremely low-light environments. To tackle this challenge, we present SG-LLIE, a new multi-scale CNN-Transformer hybrid framework guided by structure priors. Different from employing pre-trained models for the extraction of semantics or illumination maps, we choose to extract robust structure priors based on illumination-invariant edge detectors. Moreover, we develop a CNN-Transformer Hybrid Structure-Guided Feature Extractor (HSGFE) module at each scale with in the UNet encoder-decoder architecture. Besides the CNN blocks which excels in multi-scale feature extraction and fusion, we introduce a Structure-Guided Transformer Block (SGTB) in each HSGFE that incorporates structural priors to modulate the enhancement process. Extensive experiments show that our method achieves state-of-the-art performance on several LLIE benchmarks in both quantitative metrics and visual quality. Our solution ranks second in the NTIRE 2025 Low-Light Enhancement Challenge. Code is released at https://github.com/minyan8/imagine.

Towards Scale-Aware Low-Light Enhancement via Structure-Guided Transformer Design

TL;DR

This work addresses the ill posed nature of low-light image enhancement by introducing SG-LLIE, a scale-aware CNN-Transformer framework guided by robust structure priors derived from illumination-invariant edges. The model integrates a Hybrid Structure-Guided Feature Extractor with Structure-Guided Transformer Blocks and cross attention, complemented by a Scale-Adaptive Module for multi-scale fusion, and employs a multi-term loss with an adaptive illumination layer. Key contributions include a principled structure priors extraction via CIConv, a novel SGTB and SGCA for structure guided restoration, and extensive evaluation showing state-of-the-art PSNR and competitive SSIM across NTIRE 2025 LLIE Challenge and LLIE benchmarks. The approach yields robust, detail-preserving LLIE on high-resolution images, with practical impact for real-world low-light imaging tasks and potential applicability to other ill posed restoration problems.

Abstract

Current Low-light Image Enhancement (LLIE) techniques predominantly rely on either direct Low-Light (LL) to Normal-Light (NL) mappings or guidance from semantic features or illumination maps. Nonetheless, the intrinsic ill-posedness of LLIE and the difficulty in retrieving robust semantics from heavily corrupted images hinder their effectiveness in extremely low-light environments. To tackle this challenge, we present SG-LLIE, a new multi-scale CNN-Transformer hybrid framework guided by structure priors. Different from employing pre-trained models for the extraction of semantics or illumination maps, we choose to extract robust structure priors based on illumination-invariant edge detectors. Moreover, we develop a CNN-Transformer Hybrid Structure-Guided Feature Extractor (HSGFE) module at each scale with in the UNet encoder-decoder architecture. Besides the CNN blocks which excels in multi-scale feature extraction and fusion, we introduce a Structure-Guided Transformer Block (SGTB) in each HSGFE that incorporates structural priors to modulate the enhancement process. Extensive experiments show that our method achieves state-of-the-art performance on several LLIE benchmarks in both quantitative metrics and visual quality. Our solution ranks second in the NTIRE 2025 Low-Light Enhancement Challenge. Code is released at https://github.com/minyan8/imagine.

Paper Structure

This paper contains 30 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our enhanced results on NTIRE 2025 Low Light Enhancement Challenge. Our method secures the best PSNR, achieves the second-best overall performance, and effectively enhance low light inputs without over-exposed artifacts.
  • Figure 2: The enhancement results of Retinexformer cai2023retinexformer (pre-trained on LOL-v2-real dataset lol-v2) for real-world data.
  • Figure 3: The overall framework of SG-LLIE. We develop our method based on ESDNet yu2022towards and adopt a similar UNet architecture. At each level of the encoder and decoder, our customized Hybrid Structure-Guided Feature Extractor (HSGFE) module is employed. Within each HSGFE, besides the Dilated Residual Dense Block (DRDB) and Semantic-Aligned Scale-Aware Module (SAM) proposed in ESDNet yu2022towards, we first extract structure priors based on color-invariant edge detectors lengyel2021zero and then develop the Structure-Guided Transformer Block (SGTB) to integrate these priors as guidance. With the integration of structure priors and our designed CNN-Transformer hybrid network, our method effectively improve the visibility and contrast with good noise suppression for diverse low light images.
  • Figure 4: Example visualization of the extracted structure priors.
  • Figure 5: Qualitative comparisons on the NTIRE 2025 LLIE Challenge dataset. We compare our model with SNR xu2022snr, UHDM (SYSU) liu2024ntire, and Retinexformer cai2023retinexformer. Our model consistently performs best or comparably well. Please zoom in for a better view.
  • ...and 2 more figures