Table of Contents
Fetching ...

MAUGIF: Mechanism-Aware Unsupervised General Image Fusion via Dual Cross-Image Autoencoders

Kunjing Yang, Zhiwei Wang, Minru Bai

TL;DR

This work introduces MAUGIF, a mechanism-aware, unsupervised general image fusion framework built on dual cross-image autoencoders (DCIAE). It classifies fusion tasks into additive and multiplicative types and uses shared-content extraction with modality-specific feature injection via decoders tailored to each fusion mechanism, enabling a single decoding step and enhanced interpretability. Extensive experiments across HMF, VIF, MFF, and MEF demonstrate superior or competitive performance with significantly lower computational cost, highlighting strong generalization and real-time applicability. The approach provides transparent fusion dynamics by visualizing modality contributions and supports practical deployment in diverse imaging domains. The code is available at the provided URL.

Abstract

Image fusion aims to integrate structural and complementary information from multi-source images. However, existing fusion methods are often either highly task-specific, or general frameworks that apply uniform strategies across diverse tasks, ignoring their distinct fusion mechanisms. To address this issue, we propose a mechanism-aware unsupervised general image fusion (MAUGIF) method based on dual cross-image autoencoders. Initially, we introduce a classification of additive and multiplicative fusion according to the inherent mechanisms of different fusion tasks. Then, dual encoders map source images into a shared latent space, capturing common content while isolating modality-specific details. During the decoding phase, dual decoders act as feature injectors, selectively reintegrating the unique characteristics of each modality into the shared content for reconstruction. The modality-specific features are injected into the source image in the fusion process, generating the fused image that integrates information from both modalities. The architecture of decoders varies according to their fusion mechanisms, enhancing both performance and interpretability. Extensive experiments are conducted on diverse fusion tasks to validate the effectiveness and generalization ability of our method. The code is available at https://anonymous.4open.science/r/MAUGIF.

MAUGIF: Mechanism-Aware Unsupervised General Image Fusion via Dual Cross-Image Autoencoders

TL;DR

This work introduces MAUGIF, a mechanism-aware, unsupervised general image fusion framework built on dual cross-image autoencoders (DCIAE). It classifies fusion tasks into additive and multiplicative types and uses shared-content extraction with modality-specific feature injection via decoders tailored to each fusion mechanism, enabling a single decoding step and enhanced interpretability. Extensive experiments across HMF, VIF, MFF, and MEF demonstrate superior or competitive performance with significantly lower computational cost, highlighting strong generalization and real-time applicability. The approach provides transparent fusion dynamics by visualizing modality contributions and supports practical deployment in diverse imaging domains. The code is available at the provided URL.

Abstract

Image fusion aims to integrate structural and complementary information from multi-source images. However, existing fusion methods are often either highly task-specific, or general frameworks that apply uniform strategies across diverse tasks, ignoring their distinct fusion mechanisms. To address this issue, we propose a mechanism-aware unsupervised general image fusion (MAUGIF) method based on dual cross-image autoencoders. Initially, we introduce a classification of additive and multiplicative fusion according to the inherent mechanisms of different fusion tasks. Then, dual encoders map source images into a shared latent space, capturing common content while isolating modality-specific details. During the decoding phase, dual decoders act as feature injectors, selectively reintegrating the unique characteristics of each modality into the shared content for reconstruction. The modality-specific features are injected into the source image in the fusion process, generating the fused image that integrates information from both modalities. The architecture of decoders varies according to their fusion mechanisms, enhancing both performance and interpretability. Extensive experiments are conducted on diverse fusion tasks to validate the effectiveness and generalization ability of our method. The code is available at https://anonymous.4open.science/r/MAUGIF.

Paper Structure

This paper contains 17 sections, 13 equations, 10 figures, 4 tables.

Figures (10)

  • Figure 1: The number of model parameters and inference time of different fusion methods.
  • Figure 2: Visual illustration of the fusion process for MFF task.
  • Figure 3: The mechanism-aware DCIAE framework for general image fusion, illustrated with four scenarios.
  • Figure 4: Architecture of the Encoder and Decoder.
  • Figure 5: The fused images of compared methods and the corresponding error maps for 'Toys' dataset.
  • ...and 5 more figures

Theorems & Definitions (4)

  • Definition 3.1
  • Remark 3.1
  • Definition 3.2
  • Remark 3.2