ECAFormer: Low-light Image Enhancement using Cross Attention
Yudi Ruan, Hao Ma, Weikai Li, Xiao Wang
TL;DR
ECAFormer tackles LLIE by embedding dual streams of visual and semantic features into a U-shaped transformer framework. The key innovation is the Dual Multi-head Self Attention (DMSA) module, which enables cross-feature interaction, and the Cross-Scale DMSA (CSDMSA) that fuses residual and current-layer information across scales. Together with a Visual-Semantic Convolution Module and perceptual plus Charbonnier losses, the approach preserves fine details while improving global illumination, achieving competitive results on multiple benchmarks and a new Traffic-297 dataset. This cross-attention-based architecture highlights the importance of inter-component and cross-layer information exchange for robust LLIE in real-world nighttime scenes.
Abstract
Low-light image enhancement (LLIE) is critical in computer vision. Existing LLIE methods often fail to discover the underlying relationships between different sub-components, causing the loss of complementary information between multiple modules and network layers, ultimately resulting in the loss of image details. To beat this shortage, we design a hierarchical mutual Enhancement via a Cross Attention transformer (ECAFormer), which introduces an architecture that enables concurrent propagation and interaction of multiple features. The model preserves detailed information by introducing a Dual Multi-head self-attention (DMSA), which leverages visual and semantic features across different scales, allowing them to guide and complement each other. Besides, a Cross-Scale DMSA block is introduced to capture the residual connection, integrating cross-layer information to further enhance image detail. Experimental results show that ECAFormer reaches competitive performance across multiple benchmarks, yielding nearly a 3% improvement in PSNR over the suboptimal method, demonstrating the effectiveness of information interaction in LLIE.
