Table of Contents
Fetching ...

A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion

Guang Yang, Jie Li, Hanxiao Lei, Xinbo Gao

TL;DR

Qualitative and quantitative results on two datasets illustrate that the proposed multi-scale dual attention framework for infrared and visible image fusion is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods.

Abstract

Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than adaptively measuring complementary information for different image pairs. In this study, we propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion, which is designed to measure and integrate complementary information in both structure and loss function at the image and patch level. In our method, the residual downsample block decomposes source images into three scales first. Then, dual attention fusion block integrates complementary information and generates a spatial and channel attention map at each scale for feature fusion. Finally, the output image is reconstructed by the residual reconstruction block. Loss function consists of image-level, feature-level and patch-level three parts, of which the calculation of the image-level and patch-level two parts are based on the weights generated by the complementary information measurement. Indeed, to constrain the pixel intensity distribution between the output and infrared image, a style loss is added. Our fusion results perform robust and informative across different scenarios. Qualitative and quantitative results on two datasets illustrate that our method is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods. Ablation experiments show the effectiveness of our information integration architecture and adaptively measure complementary information retention in the loss function.

A Multi-scale Information Integration Framework for Infrared and Visible Image Fusion

TL;DR

Qualitative and quantitative results on two datasets illustrate that the proposed multi-scale dual attention framework for infrared and visible image fusion is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods.

Abstract

Infrared and visible image fusion aims at generating a fused image containing the intensity and detail information of source images, and the key issue is effectively measuring and integrating the complementary information of multi-modality images from the same scene. Existing methods mostly adopt a simple weight in the loss function to decide the information retention of each modality rather than adaptively measuring complementary information for different image pairs. In this study, we propose a multi-scale dual attention (MDA) framework for infrared and visible image fusion, which is designed to measure and integrate complementary information in both structure and loss function at the image and patch level. In our method, the residual downsample block decomposes source images into three scales first. Then, dual attention fusion block integrates complementary information and generates a spatial and channel attention map at each scale for feature fusion. Finally, the output image is reconstructed by the residual reconstruction block. Loss function consists of image-level, feature-level and patch-level three parts, of which the calculation of the image-level and patch-level two parts are based on the weights generated by the complementary information measurement. Indeed, to constrain the pixel intensity distribution between the output and infrared image, a style loss is added. Our fusion results perform robust and informative across different scenarios. Qualitative and quantitative results on two datasets illustrate that our method is able to preserve both thermal radiation and detailed information from two modalities and achieve comparable results compared with the other state-of-the-art methods. Ablation experiments show the effectiveness of our information integration architecture and adaptively measure complementary information retention in the loss function.
Paper Structure (30 sections, 14 equations, 10 figures, 3 tables)

This paper contains 30 sections, 14 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: The pipeline of the proposed framework. The overall network is composed of two branches with infrared and visible images as inputs. Each branch is first decomposed into three scales by the residual downsample block. Then, features from two branches are integrated by dual attention fusion block and reconstructed by residual reconstruction block.
  • Figure 2: Illustration of the dual attention fusion block, residual downsample block and residual reconstruction block. Dual attention fusion block generates attention maps for each branch and outputs a fused feature map. Residual downsample/reconstruction block is performed to extract features and reconstruct images.
  • Figure 3: Visualization of feature maps extracted by VGG-16. Given an infrared and visible image pair, from up to bottom are feature maps extracted by convolutional layers before five max-pooling layers. $i$ denotes feature maps extracted before the $i$-th max-pooling layer of VGG-16.
  • Figure 4: Qualitative comparison of our method with 7 state-of-the-art models on seven infrared and visible image pairs in the TNO dataset. The first and second rows are visible and infrared images, respectively. From the third to the last rows are fusion results of DenseFuseli2018densefuse, RFN-Nestli2021rfn, GANMcCma2020ganmcc, UMF-CMGRdiunsupervised, SwinFusionma2022swinfusion, LRRNetli2023lrrnet, FAFusionxiao2024fafusion and our MDA.
  • Figure 5: Qualitative comparison of our method with 7 state-of-the-art models on seven infrared and visible image pairs in the RoadScene dataset. The first and second rows are visible and infrared images, respectively. From the third to the last rows are fusion results of DenseFuseli2018densefuse, RFN-Nestli2021rfn, GANMcCma2020ganmcc, UMF-CMGRdiunsupervised, SwinFusionma2022swinfusion, LRRNetli2023lrrnet, FAFusionxiao2024fafusion and our MDA.
  • ...and 5 more figures