Table of Contents
Fetching ...

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

Jinyuan Liu, Bowei Zhang, Qingyun Mei, Xingyuan Li, Yang Zou, Zhiying Jiang, Long Ma, Risheng Liu, Xin Fan

TL;DR

The paper addresses the challenge of jointly optimizing infrared-visible image fusion and downstream perception tasks. It introduces DCEvo, a framework that integrates a Discriminative Enhancer to emphasize object-centric features and a Cross-Dimensional Embedding to allow mutual supervision between high-dimensional task features and low-dimensional fusion features, all guided by an Evolutionary Algorithm that adaptively balances multiple objectives. Key contributions include modeling dual-task optimization as a multi-objective problem, learning evolutionary hyperparameters, and demonstrating improved visual quality as well as enhanced downstream detection and segmentation performance. The approach shows robust gains on multiple IVIF benchmarks and suggests a practical pathway for task-aware fusion in real-world intelligent systems, with code available for reproducibility.

Abstract

Infrared and visible image fusion integrates information from distinct spectral bands to enhance image quality by leveraging the strengths and mitigating the limitations of each modality. Existing approaches typically treat image fusion and subsequent high-level tasks as separate processes, resulting in fused images that offer only marginal gains in task performance and fail to provide constructive feedback for optimizing the fusion process. To overcome these limitations, we propose a Discriminative Cross-Dimension Evolutionary Learning Framework, termed DCEvo, which simultaneously enhances visual quality and perception accuracy. Leveraging the robust search capabilities of Evolutionary Learning, our approach formulates the optimization of dual tasks as a multi-objective problem by employing an Evolutionary Algorithm (EA) to dynamically balance loss function parameters. Inspired by visual neuroscience, we integrate a Discriminative Enhancer (DE) within both the encoder and decoder, enabling the effective learning of complementary features from different modalities. Additionally, our Cross-Dimensional Embedding (CDE) block facilitates mutual enhancement between high-dimensional task features and low-dimensional fusion features, ensuring a cohesive and efficient feature integration process. Experimental results on three benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches, achieving an average improvement of 9.32% in visual quality while also enhancing subsequent high-level tasks. The code is available at https://github.com/Beate-Suy-Zhang/DCEvo.

DCEvo: Discriminative Cross-Dimensional Evolutionary Learning for Infrared and Visible Image Fusion

TL;DR

The paper addresses the challenge of jointly optimizing infrared-visible image fusion and downstream perception tasks. It introduces DCEvo, a framework that integrates a Discriminative Enhancer to emphasize object-centric features and a Cross-Dimensional Embedding to allow mutual supervision between high-dimensional task features and low-dimensional fusion features, all guided by an Evolutionary Algorithm that adaptively balances multiple objectives. Key contributions include modeling dual-task optimization as a multi-objective problem, learning evolutionary hyperparameters, and demonstrating improved visual quality as well as enhanced downstream detection and segmentation performance. The approach shows robust gains on multiple IVIF benchmarks and suggests a practical pathway for task-aware fusion in real-world intelligent systems, with code available for reproducibility.

Abstract

Infrared and visible image fusion integrates information from distinct spectral bands to enhance image quality by leveraging the strengths and mitigating the limitations of each modality. Existing approaches typically treat image fusion and subsequent high-level tasks as separate processes, resulting in fused images that offer only marginal gains in task performance and fail to provide constructive feedback for optimizing the fusion process. To overcome these limitations, we propose a Discriminative Cross-Dimension Evolutionary Learning Framework, termed DCEvo, which simultaneously enhances visual quality and perception accuracy. Leveraging the robust search capabilities of Evolutionary Learning, our approach formulates the optimization of dual tasks as a multi-objective problem by employing an Evolutionary Algorithm (EA) to dynamically balance loss function parameters. Inspired by visual neuroscience, we integrate a Discriminative Enhancer (DE) within both the encoder and decoder, enabling the effective learning of complementary features from different modalities. Additionally, our Cross-Dimensional Embedding (CDE) block facilitates mutual enhancement between high-dimensional task features and low-dimensional fusion features, ensuring a cohesive and efficient feature integration process. Experimental results on three benchmarks demonstrate that our method significantly outperforms state-of-the-art approaches, achieving an average improvement of 9.32% in visual quality while also enhancing subsequent high-level tasks. The code is available at https://github.com/Beate-Suy-Zhang/DCEvo.

Paper Structure

This paper contains 14 sections, 4 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: An Overall Illustration of our DCEvo architecture. The middle depicts our infrared and visible image fusion network to generate images by coupling pixel-level feature and task-level feature. The upper part denotes the detection network, which embeds the task-level feature for fusion supervision to enable that fused images contain object information. In the cooperative training process of detection and fusion network, we propose an evolutionary learning strategy to search the coefficient of the optimization objectives, as in the bottom part.
  • Figure 2: The illustration of different network workflows and learning strategies of IVIF towards upper-level tasks. Strategy (a) trains fusion network by only low-level constraints, while (b) cascades a detection network to guide the fusion network with additional high-level constraints. Our (c) training approach deploys an evolutionary algorithm to optimize the two task cooperatively and effectively.
  • Figure 3: Qualitative comparisons of our DCEvo and existing image fusion methods. From top to bottom: low-light in TNO, high-brightness in RoadScene and low-quality in M$^3$FD.
  • Figure 4: Qualitative comparison task of our method and existing infrared and visible image fusion methods in downstream object detection on the M$^3$FD dataset. The objects in our fusion images are fully detected.
  • Figure 5: Qualitative comparison of our DCEvo with the fusion images generated by different fusion methods on the FMB dataset. Our approach performs the best segmentation results in the smock scene and daytime scene.
  • ...and 3 more figures