Table of Contents
Fetching ...

Haze-Aware Attention Network for Single-Image Dehazing

Lihan Tong, Yun Liu, Weijia Li, Liyuan Chen, Erkang Chen

TL;DR

Single-image dehazing is inherently ill-posed, and prior methods struggle with real-world variability while attention-based approaches can be inefficient. The paper introduces HAA-Net, which combines a Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM) inside a lightweight U-Net, grounding feature extraction in the atmospheric scattering model and enhancing high-frequency details without heavy transforms. HAAM estimates atmospheric light via global context and transmission via local features, formulating dehazed features with $J = X - A(1 - T) \cdot T'$ where $T'$ approximates $1/T$, while MFEM decomposes features into multiscale frequency bands and applies learnable channel weights to emphasize important components. Across RESIDE-Indoor, RESIDE-Outdoor, Haze4K and real-world datasets, HAA-Net achieves state-of-the-art PSNR/SSIM with notable parameter efficiency, e.g., PSNR 41.21 dB and SSIM 0.996 on RESIDE-Indoor and PSNR 33.93 dB on Haze4K, though it faces limitations in deployment due to its 18.7M parameter count and 122.48 GMacs. These results demonstrate the practical impact of integrating physical priors with multiscale frequency cues for dehazing and offer a blueprint for physics-informed attention in vision tasks.

Abstract

Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM). The HAAM is inspired by the atmospheric scattering model, thus skillfully integrating physical principles into high-dimensional features for targeted dehazing. It picks up on latent features during the image restoration process, which gives a significant boost to the metrics, while the MFEM efficiently enhances high-frequency details, thus sidestepping wavelet or Fourier transform complexities. It employs multiscale fields to extract and emphasize key frequency components with minimal parameter overhead. Integrated into a simple U-Net framework, our Haze-Aware Attention Network (HAA-Net) for single-image dehazing significantly outperforms existing attention-based and transformer models in efficiency and effectiveness. Tested across various public datasets, the HAA-Net sets new performance benchmarks. Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.

Haze-Aware Attention Network for Single-Image Dehazing

TL;DR

Single-image dehazing is inherently ill-posed, and prior methods struggle with real-world variability while attention-based approaches can be inefficient. The paper introduces HAA-Net, which combines a Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM) inside a lightweight U-Net, grounding feature extraction in the atmospheric scattering model and enhancing high-frequency details without heavy transforms. HAAM estimates atmospheric light via global context and transmission via local features, formulating dehazed features with where approximates , while MFEM decomposes features into multiscale frequency bands and applies learnable channel weights to emphasize important components. Across RESIDE-Indoor, RESIDE-Outdoor, Haze4K and real-world datasets, HAA-Net achieves state-of-the-art PSNR/SSIM with notable parameter efficiency, e.g., PSNR 41.21 dB and SSIM 0.996 on RESIDE-Indoor and PSNR 33.93 dB on Haze4K, though it faces limitations in deployment due to its 18.7M parameter count and 122.48 GMacs. These results demonstrate the practical impact of integrating physical priors with multiscale frequency cues for dehazing and offer a blueprint for physics-informed attention in vision tasks.

Abstract

Single-image dehazing is a pivotal challenge in computer vision that seeks to remove haze from images and restore clean background details. Recognizing the limitations of traditional physical model-based methods and the inefficiencies of current attention-based solutions, we propose a new dehazing network combining an innovative Haze-Aware Attention Module (HAAM) with a Multiscale Frequency Enhancement Module (MFEM). The HAAM is inspired by the atmospheric scattering model, thus skillfully integrating physical principles into high-dimensional features for targeted dehazing. It picks up on latent features during the image restoration process, which gives a significant boost to the metrics, while the MFEM efficiently enhances high-frequency details, thus sidestepping wavelet or Fourier transform complexities. It employs multiscale fields to extract and emphasize key frequency components with minimal parameter overhead. Integrated into a simple U-Net framework, our Haze-Aware Attention Network (HAA-Net) for single-image dehazing significantly outperforms existing attention-based and transformer models in efficiency and effectiveness. Tested across various public datasets, the HAA-Net sets new performance benchmarks. Our work not only advances the field of image dehazing but also offers insights into the design of attention mechanisms for broader applications in computer vision.
Paper Structure (17 sections, 9 equations, 6 figures, 2 tables)

This paper contains 17 sections, 9 equations, 6 figures, 2 tables.

Figures (6)

  • Figure S1: The overview of our Haze-Aware Attention Network architecture. We give details of the structure and configurations in Section \ref{['sec3']}. SKFusion li2019selective is a feature fusion method.
  • Figure S2: Visual results comparisons on real-world hazy images from the RTTS dataset li2019benchmarking. Zoom in for best view.
  • Figure S3: Multiscale Frequency Enhancement Module. GAP stands for Global Average Pooling. AP k × k means an Average Pooling operation with a kernel size of k × k. Modulation is a process that recalibrates the channels by setting attention weights as directly learnable parameters, without adding any extra layers. Learnable parameters are adjustable values that help adjust the weights at different scales.
  • Figure S4: Visual results comparisons on RESIDE-Indoor li2019benchmarking dataset. Zoom in for best view.
  • Figure S5: Visual results comparisons on synthetic hazy images from the RESIDE-Outdoor dataset li2019benchmarking. Zoom in for best view.
  • ...and 1 more figures