DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Yuchen Guo; Ruoxiang Xu; Rongcheng Li; Weifeng Su

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Yuchen Guo, Ruoxiang Xu, Rongcheng Li, Weifeng Su

TL;DR

DAE-Fuse tackles robust multi-modality image fusion under adverse conditions by introducing a two-phase discriminative autoencoder with a cross-modality attention fusion module. Phase 1 enhances feature extraction with a dual-branch encoder and adversarial discriminators, while Phase 2 performs cross-modality fusion through cross-attention and an adversarial fusion objective to avoid modality bias. The approach achieves state-of-the-art results on infrared-visible image fusion benchmarks and improves object detection performance, with strong generalization to medical image fusion tasks and initial temporal consistency for video fusion. This work advances practical perception for autonomous navigation and surveillance by delivering sharp, texture-rich fused images and temporally stable video outputs.

Abstract

In extreme scenarios such as nighttime or low-visibility environments, achieving reliable perception is critical for applications like autonomous driving, robotics, and surveillance. Multi-modality image fusion, particularly integrating infrared imaging, offers a robust solution by combining complementary information from different modalities to enhance scene understanding and decision-making. However, current methods face significant limitations: GAN-based approaches often produce blurry images that lack fine-grained details, while AE-based methods may introduce bias toward specific modalities, leading to unnatural fusion results. To address these challenges, we propose DAE-Fuse, a novel two-phase discriminative autoencoder framework that generates sharp and natural fused images. Furthermore, We pioneer the extension of image fusion techniques from static images to the video domain while preserving temporal consistency across frames, thus advancing the perceptual capabilities required for autonomous navigation. Extensive experiments on public datasets demonstrate that DAE-Fuse achieves state-of-the-art performance on multiple benchmarks, with superior generalizability to tasks like medical image fusion.

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

TL;DR

Abstract

Paper Structure (28 sections, 15 equations, 5 figures, 4 tables)

This paper contains 28 sections, 15 equations, 5 figures, 4 tables.

Introduction
Related Work
Methods
Overview
Adversarial Feature Extraction Phase
Attention-guided Cross-modality Fusion Phase
Early Fusion
Adversarial Fusion
Loss Function
Temporal Consistency Loss
Phase one
Phase two
Experiments
Setup
Datasets and metrics
...and 13 more sections

Figures (5)

Figure 1: The workflow of the adversarial feature extraction phase. The cross-attention for fusion purpose is dismissed.
Figure 2: The workflow of the attention-guided cross-modality fusion phase.
Figure 3: Object detection ability of DAE-Fuse: the visible image can detect the car in the right but fail to capture the people; the infrared displays an opposite ability on this two objects; and the fused image from DAE-Fuse successfully detects all of them.
Figure 4: Qualitative comparison with state-of-the-art methods on TNO dataset.
Figure 5: Qualitative comparison with state-of-the-art methods on MRI-CT dataset.

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

TL;DR

Abstract

DAE-Fuse: An Adaptive Discriminative Autoencoder for Multi-Modality Image Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (5)