VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Meng Yu; Te Cui; Haoyang Lu; Yufeng Yue

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue

TL;DR

This paper tackles image dehazing under dense haze by fusing visible and infrared modalities in an end-to-end network. It introduces VIFNet, featuring a Deep Structure Feature Extraction (DSFE) module with Channel-Pixel Attention Block (CPAB) and an inconsistency fusion strategy to combine multi-scale features from both modalities. A new AirSim-VID multimodal dataset is provided for validation, with experiments on additional NTIRE and natural hazy datasets showing state-of-the-art PSNR/SSIM gains, albeit with some color distortion from infrared fusion. The training objective combines $\mathcal{L}_1$, $\mathcal{L}_{\mathrm{M}}$, and $\mathcal{L}_{Dice}$ to preserve multi-scale structure and edges, enabling robust haze removal across challenging conditions and highlighting the potential and limitations of multimodal fusion for practical vision systems.

Abstract

Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

TL;DR

, and

to preserve multi-scale structure and edges, enabling robust haze removal across challenging conditions and highlighting the potential and limitations of multimodal fusion for practical vision systems.

Abstract

Paper Structure (25 sections, 12 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 25 sections, 12 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Related works
Single Image Dehazing
Handcrafted Prior-based Image Dehazing Methods
Deep learning-based Image Dehazing Methods
Visible-infrared Fusion for Image Dehazing
Proposed Method
Overview of VIFNet
DSFE Module
Inconsistency Fusion Strategy
Loss Function
Experiments
Dataset
Implementation Details
Quantitative and Qualitative Results
...and 10 more sections

Figures (10)

Figure 1: Comparative results of dehazing networks on the proposed AirSim-VID dataset. The first column is the result of the single image dehazing network DeHamer b21 (SOTA), the second column is derived from the proposed VIFNet, and the last column is the ground truth. The enlarged red boxes highlight the superiority of the proposed VIFNet.
Figure 2: Overall architecture of the proposed VIFNet. In the deep feature extraction stage, an encoder-decoder architecture and DSFE module are adopted to extract multi-scale structure features from coarse to fine. Then, the multi-scale deep structure features are fused by applying the inconsistency fusion strategy and subsequently aggregated into the encoder, together with the summation of raw visible images and coarse visible features. Finally, the training process is supervised by a combined loss function.
Figure 3: Detailed frame of the Deep Structure Feature Extraction (DSFE) module. The multi-scale encoded and decoded feature maps are regarded as input, and the module outputs the deep structure feature maps of three different scales.
Figure 4: Visualization of deep structure feature maps of the hazed visible and infrared images, the calculated inconsistency feature map, and weighted structure feature map. With inconsistency fusion strategy, the weighted feature map enhances the overall structural information.
Figure 5: Comparison of dehazing results on the AirSim-VID dataset. The first two columns, the middle two columns, and the last column represent mist, medium haze, and dense haze, respectively.
...and 5 more figures

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

TL;DR

Abstract

VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing

Authors

TL;DR

Abstract

Table of Contents

Figures (10)