Table of Contents
Fetching ...

Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement

Eashan Adhikarla, Kai Zhang, Rosaura G. VidalMata, Manjushree Aithal, Nikhil Ambha Madhusudhana, John Nicholson, Lichao Sun, Brian D. Davison

TL;DR

The paper tackles mixed-exposure image enhancement by introducing Unified-EGformer, a lightweight exposure-guided transformer that jointly handles underexposed, overexposed, and mixed-exposure regions. It combines a Guided Map Generation module with local (LEB) and global (GEB) refinement blocks and an Exposure-Aware Fusion (EAF) block to produce balanced enhancements while maintaining edge-device feasibility (≈$0.1$M parameters, ≈$1134$ MB peak memory, ≈$95$ ms inference). A novel MUL-ADD loss and a physics-informed finetuning loss underpin a two-stage training regime across diverse datasets (LOL, ME-v2, SICE, MIT-FiveK), yielding strong PSNR/SSIM gains with significantly smaller models than many baselines. The approach demonstrates strong generalization, real-time applicability, and robustness to real-world mixed-exposure scenes, with potential extensions to color-independence and lightweight state-space integrations.

Abstract

Despite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only $\sim$1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture.

Unified-EGformer: Exposure Guided Lightweight Transformer for Mixed-Exposure Image Enhancement

TL;DR

The paper tackles mixed-exposure image enhancement by introducing Unified-EGformer, a lightweight exposure-guided transformer that jointly handles underexposed, overexposed, and mixed-exposure regions. It combines a Guided Map Generation module with local (LEB) and global (GEB) refinement blocks and an Exposure-Aware Fusion (EAF) block to produce balanced enhancements while maintaining edge-device feasibility (≈M parameters, ≈ MB peak memory, ≈ ms inference). A novel MUL-ADD loss and a physics-informed finetuning loss underpin a two-stage training regime across diverse datasets (LOL, ME-v2, SICE, MIT-FiveK), yielding strong PSNR/SSIM gains with significantly smaller models than many baselines. The approach demonstrates strong generalization, real-time applicability, and robustness to real-world mixed-exposure scenes, with potential extensions to color-independence and lightweight state-space integrations.

Abstract

Despite recent strides made by AI in image processing, the issue of mixed exposure, pivotal in many real-world scenarios like surveillance and photography, remains inadequately addressed. Traditional image enhancement techniques and current transformer models are limited with primary focus on either overexposure or underexposure. To bridge this gap, we introduce the Unified-Exposure Guided Transformer (Unified-EGformer). Our proposed solution is built upon advanced transformer architectures, equipped with local pixel-level refinement and global refinement blocks for color correction and image-wide adjustments. We employ a guided attention mechanism to precisely identify exposure-compromised regions, ensuring its adaptability across various real-world conditions. U-EGformer, with a lightweight design featuring a memory footprint (peak memory) of only 1134 MB (0.1 Million parameters) and an inference time of 95 ms (9.61x faster than the average), is a viable choice for real-time applications such as surveillance and autonomous navigation. Additionally, our model is highly generalizable, requiring minimal fine-tuning to handle multiple tasks and datasets with a single architecture.
Paper Structure (19 sections, 8 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 19 sections, 8 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: Sub-figures [a,b,c] show the handcrafted mixed exposure dataset by Zheng et al. zheng2022low; images [d-i] from Cai et al. Cai2018deep illustrate real-world scenarios of underexposure, overexposure, and mixed-exposures. Images [j,k] demonstrate the problem practically.
  • Figure 2: U-EGformer's training, fine-tuning and inference pipelines. All four modules are showcased: Guided Attention Map, Local Block, Global Block, and Exposure Aware Fusion block.
  • Figure 3: Visualization of Otsu thresholding challenge: (a) original image, (b) mask for single exposure (underexposed), and (c) mask for bi-exposure (under and overexposed).
  • Figure 4: The top row, from left to right: an underexposed image, an overexposed image, and the ground truth. The bottom row illustrates pixel-thresholding binary masks for the underexposed (white indicating underexposed regions), overexposed (white indicating overexposed regions) and Otsu thresholding for mixed exposures (yellow indicating underexposed regions, white representing overexposed areas, and black as correctly exposed portions.
  • Figure 5: Chain of efficient transformer blocks equipped with A-MSA, DGFN.
  • ...and 7 more figures