Table of Contents
Fetching ...

Parallel Cross Strip Attention Network for Single Image Dehazing

Lihan Tong, Yun Liu, Tian Ye, Weijia Li, Liyuan Chen, Erkang Chen

TL;DR

The paper tackles single image dehazing by addressing the limitation of fixed receptive fields and costly self-attention in dense prediction tasks. It introduces PCSA-Net, a lightweight encoder-decoder network built around the Parallel Cross Strip Attention (PCSA) module, which simultaneously captures horizontal and vertical strip features and fuses them with a channel-wise multi-scale strategy. The core contributions include the PCSA module, a multi-scale PCSAM design, and a loss that combines L1 with Contrastive Regularization to leverage hazy-clean image pairs. Experimental results on synthetic and real hazy datasets show state-of-the-art PSNR/SSIM performance and superior qualitative restoration, highlighting the approach’s effectiveness, efficiency, and robustness for practical dehazing applications.

Abstract

The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel dense prediction tasks. Moreover, fixed kernels or tokens in most models do not adapt well to varying blur sizes, resulting in suboptimal dehazing performance. In this study, we introduce a novel dehazing network based on Parallel Stripe Cross Attention (PCSA) with a multi-scale strategy. PCSA efficiently integrates long-range dependencies by simultaneously capturing horizontal and vertical relationships, allowing each pixel to capture contextual cues from an expanded spatial domain. To handle different sizes and shapes of blurs flexibly, We employs a channel-wise design with varying convolutional kernel sizes and strip lengths in each PCSA to capture context information at different scales.Additionally, we incorporate a softmax-based adaptive weighting mechanism within PCSA to prioritize and leverage more critical features.

Parallel Cross Strip Attention Network for Single Image Dehazing

TL;DR

The paper tackles single image dehazing by addressing the limitation of fixed receptive fields and costly self-attention in dense prediction tasks. It introduces PCSA-Net, a lightweight encoder-decoder network built around the Parallel Cross Strip Attention (PCSA) module, which simultaneously captures horizontal and vertical strip features and fuses them with a channel-wise multi-scale strategy. The core contributions include the PCSA module, a multi-scale PCSAM design, and a loss that combines L1 with Contrastive Regularization to leverage hazy-clean image pairs. Experimental results on synthetic and real hazy datasets show state-of-the-art PSNR/SSIM performance and superior qualitative restoration, highlighting the approach’s effectiveness, efficiency, and robustness for practical dehazing applications.

Abstract

The objective of single image dehazing is to restore hazy images and produce clear, high-quality visuals. Traditional convolutional models struggle with long-range dependencies due to their limited receptive field size. While Transformers excel at capturing such dependencies, their quadratic computational complexity in relation to feature map resolution makes them less suitable for pixel-to-pixel dense prediction tasks. Moreover, fixed kernels or tokens in most models do not adapt well to varying blur sizes, resulting in suboptimal dehazing performance. In this study, we introduce a novel dehazing network based on Parallel Stripe Cross Attention (PCSA) with a multi-scale strategy. PCSA efficiently integrates long-range dependencies by simultaneously capturing horizontal and vertical relationships, allowing each pixel to capture contextual cues from an expanded spatial domain. To handle different sizes and shapes of blurs flexibly, We employs a channel-wise design with varying convolutional kernel sizes and strip lengths in each PCSA to capture context information at different scales.Additionally, we incorporate a softmax-based adaptive weighting mechanism within PCSA to prioritize and leverage more critical features.
Paper Structure (20 sections, 7 equations, 4 figures, 2 tables)

This paper contains 20 sections, 7 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The Overview of our Parallel Cross Strip Attention Network architecture. We give details of the structure and configurations in Section III.
  • Figure 2: The box on the left shows the structure of the Parallel Cross Strip Attention Block (PCSAB), which uses a parallel structure for higher execution efficiency. The PCSAM is the core module of PCSAB, which is explained in detail in the methods section. The box on the right summarizes the basic principles of various attention mechanisms. (a) our PCSA, (b) Strip attention tsai2022stripformer, (c) bi-level routing attention zhu2023biformer, (d) self-attention. By comparison, it can be concluded that our PCSA is the most efficient.
  • Figure 3: Visual results comparisons on Haze4K li2019benchmarking dataset. Zoom in for best view.
  • Figure 4: Visual results comparisons on real-world hazy images. The samples from the RTTS dataset li2019benchmarking. Zoom in for best view.