Table of Contents
Fetching ...

Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws

Lin Guo, Xiaoqing Luo, Wei Xie, Zhancheng Zhang, Hui Li, Rui Wang, Zhenhua Feng, Xiaoning Song

TL;DR

This work tackles infrared–visible image fusion by reframing it through human cognitive principles and probabilistic reasoning. It introduces HCLFuse, a diffusion-based framework that couples an optimal-transport–driven alignment with a multi-scale variational bottleneck encoder and a physics-guided diffusion process, enabling more interpretable and structurally consistent fusion under uncertainty. The approach yields state-of-the-art results on multiple benchmarks and improves downstream semantic segmentation, while providing formal guarantees via information-theoretic bounds and physically informed constraints. Although powerful, the method relies on well-aligned modal pairs and incurs diffusion-related computational overhead, highlighting a trade-off between quality and practicality in real-time settings.

Abstract

Existing infrared and visible image fusion methods often face the dilemma of balancing modal information. Generative fusion methods reconstruct fused images by learning from data distributions, but their generative capabilities remain limited. Moreover, the lack of interpretability in modal information selection further affects the reliability and consistency of fusion results in complex scenarios. This manuscript revisits the essence of generative image fusion under the inspiration of human cognitive laws and proposes a novel infrared and visible image fusion method, termed HCLFuse. First, HCLFuse investigates the quantification theory of information mapping in unsupervised fusion networks, which leads to the design of a multi-scale mask-regulated variational bottleneck encoder. This encoder applies posterior probability modeling and information decomposition to extract accurate and concise low-level modal information, thereby supporting the generation of high-fidelity structural details. Furthermore, the probabilistic generative capability of the diffusion model is integrated with physical laws, forming a time-varying physical guidance mechanism that adaptively regulates the generation process at different stages, thereby enhancing the ability of the model to perceive the intrinsic structure of data and reducing dependence on data quality. Experimental results show that the proposed method achieves state-of-the-art fusion performance in qualitative and quantitative evaluations across multiple datasets and significantly improves semantic segmentation metrics. This fully demonstrates the advantages of this generative image fusion method, drawing inspiration from human cognition, in enhancing structural consistency and detail quality.

Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws

TL;DR

This work tackles infrared–visible image fusion by reframing it through human cognitive principles and probabilistic reasoning. It introduces HCLFuse, a diffusion-based framework that couples an optimal-transport–driven alignment with a multi-scale variational bottleneck encoder and a physics-guided diffusion process, enabling more interpretable and structurally consistent fusion under uncertainty. The approach yields state-of-the-art results on multiple benchmarks and improves downstream semantic segmentation, while providing formal guarantees via information-theoretic bounds and physically informed constraints. Although powerful, the method relies on well-aligned modal pairs and incurs diffusion-related computational overhead, highlighting a trade-off between quality and practicality in real-time settings.

Abstract

Existing infrared and visible image fusion methods often face the dilemma of balancing modal information. Generative fusion methods reconstruct fused images by learning from data distributions, but their generative capabilities remain limited. Moreover, the lack of interpretability in modal information selection further affects the reliability and consistency of fusion results in complex scenarios. This manuscript revisits the essence of generative image fusion under the inspiration of human cognitive laws and proposes a novel infrared and visible image fusion method, termed HCLFuse. First, HCLFuse investigates the quantification theory of information mapping in unsupervised fusion networks, which leads to the design of a multi-scale mask-regulated variational bottleneck encoder. This encoder applies posterior probability modeling and information decomposition to extract accurate and concise low-level modal information, thereby supporting the generation of high-fidelity structural details. Furthermore, the probabilistic generative capability of the diffusion model is integrated with physical laws, forming a time-varying physical guidance mechanism that adaptively regulates the generation process at different stages, thereby enhancing the ability of the model to perceive the intrinsic structure of data and reducing dependence on data quality. Experimental results show that the proposed method achieves state-of-the-art fusion performance in qualitative and quantitative evaluations across multiple datasets and significantly improves semantic segmentation metrics. This fully demonstrates the advantages of this generative image fusion method, drawing inspiration from human cognition, in enhancing structural consistency and detail quality.

Paper Structure

This paper contains 30 sections, 2 theorems, 36 equations, 10 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

(Lower Bound of Mutual Information under Unsupervised Mapping) Let modal inputs $X \sim p_X$ and $Y \sim p_Y$, with the fused representation $Z \sim q(z|x, y)$. Assume the existence of a latent task-relevant variable $C$ that satisfies the causal dependency $C \rightarrow (X, Y) \rightarrow Z$. Then

Figures (10)

  • Figure 1: Comparative visualization of gaussian curvature in generative infrared and visible image fusion methods.
  • Figure 2: Overall architecture of HCLFuse and feature evolution across the diffusion process.
  • Figure 3: Visualization results of several methods on MSRS dataset 00621D (image name) scene.
  • Figure 4: Visualization results of several methods on MSRS dataset 00774N scene.
  • Figure 5: Visualization of ablation study results on the MSRS dataset.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2: Upper Bound of Redundant Mutual Information in the Perturbation Term