Table of Contents
Fetching ...

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

TL;DR

This paper tackles image dehazing under dense haze by fusing visible and infrared modalities in an end-to-end network. It introduces VIFNet, featuring a Deep Structure Feature Extraction (DSFE) module with Channel-Pixel Attention Block (CPAB) and an inconsistency fusion strategy to combine multi-scale features from both modalities. A new AirSim-VID multimodal dataset is provided for validation, with experiments on additional NTIRE and natural hazy datasets showing state-of-the-art PSNR/SSIM gains, albeit with some color distortion from infrared fusion. The training objective combines $\mathcal{L}_1$, $\mathcal{L}_{\mathrm{M}}$, and $\mathcal{L}_{Dice}$ to preserve multi-scale structure and edges, enabling robust haze removal across challenging conditions and highlighting the potential and limitations of multimodal fusion for practical vision systems.

Abstract

Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

TL;DR

This paper tackles image dehazing under dense haze by fusing visible and infrared modalities in an end-to-end network. It introduces VIFNet, featuring a Deep Structure Feature Extraction (DSFE) module with Channel-Pixel Attention Block (CPAB) and an inconsistency fusion strategy to combine multi-scale features from both modalities. A new AirSim-VID multimodal dataset is provided for validation, with experiments on additional NTIRE and natural hazy datasets showing state-of-the-art PSNR/SSIM gains, albeit with some color distortion from infrared fusion. The training objective combines , , and to preserve multi-scale structure and edges, enabling robust haze removal across challenging conditions and highlighting the potential and limitations of multimodal fusion for practical vision systems.

Abstract

Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.
Paper Structure (25 sections, 12 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 12 equations, 10 figures, 4 tables, 1 algorithm.

Figures (10)

  • Figure 1: Comparative results of dehazing networks on the proposed AirSim-VID dataset. The first column is the result of the single image dehazing network DeHamer b21 (SOTA), the second column is derived from the proposed VIFNet, and the last column is the ground truth. The enlarged red boxes highlight the superiority of the proposed VIFNet.
  • Figure 2: Overall architecture of the proposed VIFNet. In the deep feature extraction stage, an encoder-decoder architecture and DSFE module are adopted to extract multi-scale structure features from coarse to fine. Then, the multi-scale deep structure features are fused by applying the inconsistency fusion strategy and subsequently aggregated into the encoder, together with the summation of raw visible images and coarse visible features. Finally, the training process is supervised by a combined loss function.
  • Figure 3: Detailed frame of the Deep Structure Feature Extraction (DSFE) module. The multi-scale encoded and decoded feature maps are regarded as input, and the module outputs the deep structure feature maps of three different scales.
  • Figure 4: Visualization of deep structure feature maps of the hazed visible and infrared images, the calculated inconsistency feature map, and weighted structure feature map. With inconsistency fusion strategy, the weighted feature map enhances the overall structural information.
  • Figure 5: Comparison of dehazing results on the AirSim-VID dataset. The first two columns, the middle two columns, and the last column represent mist, medium haze, and dense haze, respectively.
  • ...and 5 more figures