VIFNet: An End-to-end Visible-Infrared Fusion Network for Image Dehazing
Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue
TL;DR
This paper tackles image dehazing under dense haze by fusing visible and infrared modalities in an end-to-end network. It introduces VIFNet, featuring a Deep Structure Feature Extraction (DSFE) module with Channel-Pixel Attention Block (CPAB) and an inconsistency fusion strategy to combine multi-scale features from both modalities. A new AirSim-VID multimodal dataset is provided for validation, with experiments on additional NTIRE and natural hazy datasets showing state-of-the-art PSNR/SSIM gains, albeit with some color distortion from infrared fusion. The training objective combines $\mathcal{L}_1$, $\mathcal{L}_{\mathrm{M}}$, and $\mathcal{L}_{Dice}$ to preserve multi-scale structure and edges, enabling robust haze removal across challenging conditions and highlighting the potential and limitations of multimodal fusion for practical vision systems.
Abstract
Image dehazing poses significant challenges in environmental perception. Recent research mainly focus on deep learning-based methods with single modality, while they may result in severe information loss especially in dense-haze scenarios. The infrared image exhibits robustness to the haze, however, existing methods have primarily treated the infrared modality as auxiliary information, failing to fully explore its rich information in dehazing. To address this challenge, the key insight of this study is to design a visible-infrared fusion network for image dehazing. In particular, we propose a multi-scale Deep Structure Feature Extraction (DSFE) module, which incorporates the Channel-Pixel Attention Block (CPAB) to restore more spatial and marginal information within the deep structural features. Additionally, we introduce an inconsistency weighted fusion strategy to merge the two modalities by leveraging the more reliable information. To validate this, we construct a visible-infrared multimodal dataset called AirSim-VID based on the AirSim simulation platform. Extensive experiments performed on challenging real and simulated image datasets demonstrate that VIFNet can outperform many state-of-the-art competing methods. The code and dataset are available at https://github.com/mengyu212/VIFNet_dehazing.
