Table of Contents
Fetching ...

Causality-Driven Infrared and Visible Image Fusion

Linli Ma, Suzhen Lin, Jianchao Zeng, Zanxia Jin, Yanbo Wang, Fengyuan Li, Yubing Luo

TL;DR

This work tackles dataset scene bias in infrared-visible image fusion by framing the task as a causal inference problem. It builds a tailored causal graph among image features $X$, fusion weights $W$, fused image $Y$, and confounder $Z$, and introduces a Back-door Adjustment based Feature Fusion Module (BAFFM) to estimate the true causal effect. BAFFM employs a NWGM-like approximation over modality-specific confounder dictionaries and an attention mechanism to deconfound fusion, promoting fair contribution from diverse scenes. Experiments on LLVIP, RoadScene, and TNO show consistent improvements over state-of-the-art methods, evidencing better generalization and reduced artifacts under scene bias.

Abstract

Image fusion aims to combine complementary information from multiple source images to generate more comprehensive scene representations. Existing methods primarily rely on the stacking and design of network architectures to enhance the fusion performance, often ignoring the impact of dataset scene bias on model training. This oversight leads the model to learn spurious correlations between specific scenes and fusion weights under conventional likelihood estimation framework, thereby limiting fusion performance. To solve the above problems, this paper first re-examines the image fusion task from the causality perspective, and disentangles the model from the impact of bias by constructing a tailored causal graph to clarify the causalities among the variables in image fusion task. Then, the Back-door Adjustment based Feature Fusion Module (BAFFM) is proposed to eliminate confounder interference and enable the model to learn the true causal effect. Finally, Extensive experiments on three standard datasets prove that the proposed method significantly surpasses state-of-the-art methods in infrared and visible image fusion.

Causality-Driven Infrared and Visible Image Fusion

TL;DR

This work tackles dataset scene bias in infrared-visible image fusion by framing the task as a causal inference problem. It builds a tailored causal graph among image features , fusion weights , fused image , and confounder , and introduces a Back-door Adjustment based Feature Fusion Module (BAFFM) to estimate the true causal effect. BAFFM employs a NWGM-like approximation over modality-specific confounder dictionaries and an attention mechanism to deconfound fusion, promoting fair contribution from diverse scenes. Experiments on LLVIP, RoadScene, and TNO show consistent improvements over state-of-the-art methods, evidencing better generalization and reduced artifacts under scene bias.

Abstract

Image fusion aims to combine complementary information from multiple source images to generate more comprehensive scene representations. Existing methods primarily rely on the stacking and design of network architectures to enhance the fusion performance, often ignoring the impact of dataset scene bias on model training. This oversight leads the model to learn spurious correlations between specific scenes and fusion weights under conventional likelihood estimation framework, thereby limiting fusion performance. To solve the above problems, this paper first re-examines the image fusion task from the causality perspective, and disentangles the model from the impact of bias by constructing a tailored causal graph to clarify the causalities among the variables in image fusion task. Then, the Back-door Adjustment based Feature Fusion Module (BAFFM) is proposed to eliminate confounder interference and enable the model to learn the true causal effect. Finally, Extensive experiments on three standard datasets prove that the proposed method significantly surpasses state-of-the-art methods in infrared and visible image fusion.

Paper Structure

This paper contains 15 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Fusion bias problem in natural scenes. The training sets of LRRNET and YDTR models are mainly based on street scenes. When predicting natural scenes such as cloud and bush, the fusion results may deviate, leading to artifacts and information loss.
  • Figure 2: Causal graph of image fusion (a) Conventional likelihood $P\left(Y|X\right)$ (b) Causal intervention $P\left(Y|do\left(X\right)\right)$.
  • Figure 3: Generation process of visible confounder dictionary $Z_V$.
  • Figure 4: Overall architecture of image fusion network, where the red box represents our proposed BAFFM.
  • Figure 5: Qualitative comparison on the LLVIP dataset.
  • ...and 3 more figures