Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement
Eashan Adhikarla, Kai Zhang, Rosaura G. VidalMata, Manjushree Aithal, Nikhil Ambha Madhusudhana, John Nicholson, Lichao Sun, Brian D. Davison
TL;DR
The paper tackles mixed-exposure image enhancement by introducing Unified-EGformer, a lightweight exposure-guided transformer that jointly handles underexposed, overexposed, and mixed-exposure regions. It combines a Guided Map Generation module with local (LEB) and global (GEB) refinement blocks and an Exposure-Aware Fusion (EAF) block to produce balanced enhancements while maintaining edge-device feasibility (≈$0.1$M parameters, ≈$1134$ MB peak memory, ≈$95$ ms inference). A novel MUL-ADD loss and a physics-informed finetuning loss underpin a two-stage training regime across diverse datasets (LOL, ME-v2, SICE, MIT-FiveK), yielding strong PSNR/SSIM gains with significantly smaller models than many baselines. The approach demonstrates strong generalization, real-time applicability, and robustness to real-world mixed-exposure scenes, with potential extensions to color-independence and lightweight state-space integrations.
Abstract
Despite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only $\sim$1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture.
