Table of Contents
Fetching ...

Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System

Jing Li, Lu Bai, Bin Yang, Chang Li, Lingfei Ma, Lixin Cui, Edwin R. Hancock

TL;DR

This work tackles infrared-visible image fusion (IVF) for intelligent transportation systems by injecting dual-modal semantic guidance into the fusion process. It introduces two parallel semantic segmentation branches, refined feature adaptive-modulation (RFaM), and multi-level representation-adaptive fusion (MRaF) to balance high-level semantics with low-frequency structure and high-frequency details. Pilot experiments identify significant semantic features to guide a fusion pathway, yielding improved semantic segmentation and fusion metrics across MSRS and FMB datasets, with strong generalization and competitive efficiency. The approach advances ITS applications by producing fused images that are both visually convincing and semantically informative for downstream tasks such as object detection and scene understanding.

Abstract

Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different tasks. Thus, they tend to typically bias toward to reconstructing raw pixels without considering the significance of semantic features. Therefore, we propose a novel prior semantic guided image fusion method based on the dual-modality strategy, improving the performance of IVF in ITS. Specifically, to explore the independent significant semantic of each modality, we first design two parallel semantic segmentation branches with a refined feature adaptive-modulation (RFaM) mechanism. RFaM can perceive the features that are semantically distinct enough in each semantic segmentation branch. Then, two pilot experiments based on the two branches are conducted to capture the significant prior semantic of two images, which then is applied to guide the fusion task in the integration of semantic segmentation branches and fusion branches. In addition, to aggregate both high-level semantics and impressive visual effects, we further investigate the frequency response of the prior semantics, and propose a multi-level representation-adaptive fusion (MRaF) module to explicitly integrate the low-frequent prior semantic with the high-frequent details. Extensive experiments on two public datasets demonstrate the superiority of our method over the state-of-the-art image fusion approaches, in terms of either the visual appeal or the high-level semantics.

Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System

TL;DR

This work tackles infrared-visible image fusion (IVF) for intelligent transportation systems by injecting dual-modal semantic guidance into the fusion process. It introduces two parallel semantic segmentation branches, refined feature adaptive-modulation (RFaM), and multi-level representation-adaptive fusion (MRaF) to balance high-level semantics with low-frequency structure and high-frequency details. Pilot experiments identify significant semantic features to guide a fusion pathway, yielding improved semantic segmentation and fusion metrics across MSRS and FMB datasets, with strong generalization and competitive efficiency. The approach advances ITS applications by producing fused images that are both visually convincing and semantically informative for downstream tasks such as object detection and scene understanding.

Abstract

Infrared and visible image fusion (IVF) plays an important role in intelligent transportation system (ITS). The early works predominantly focus on boosting the visual appeal of the fused result, and only several recent approaches have tried to combine the high-level vision task with IVF. However, they prioritize the design of cascaded structure to seek unified suitable features and fit different tasks. Thus, they tend to typically bias toward to reconstructing raw pixels without considering the significance of semantic features. Therefore, we propose a novel prior semantic guided image fusion method based on the dual-modality strategy, improving the performance of IVF in ITS. Specifically, to explore the independent significant semantic of each modality, we first design two parallel semantic segmentation branches with a refined feature adaptive-modulation (RFaM) mechanism. RFaM can perceive the features that are semantically distinct enough in each semantic segmentation branch. Then, two pilot experiments based on the two branches are conducted to capture the significant prior semantic of two images, which then is applied to guide the fusion task in the integration of semantic segmentation branches and fusion branches. In addition, to aggregate both high-level semantics and impressive visual effects, we further investigate the frequency response of the prior semantics, and propose a multi-level representation-adaptive fusion (MRaF) module to explicitly integrate the low-frequent prior semantic with the high-frequent details. Extensive experiments on two public datasets demonstrate the superiority of our method over the state-of-the-art image fusion approaches, in terms of either the visual appeal or the high-level semantics.
Paper Structure (23 sections, 9 equations, 16 figures, 3 tables)

This paper contains 23 sections, 9 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: Comparisons of our method and several task-driven methods in high-level vision task and image fusion task. Our method has the better performance both in qualitative and quantitative experiments of the two tasks, and our method also has less parameters than other semantic segmentation task-driven methods, such as PSFusion, SegMiF.
  • Figure 2: The overview and development of infrared and visible image fusion in ITS. We also discuss the differences between our method and recent task-driven methods.
  • Figure 3: The overall framework of the proposed method based on dual-modality semantic guided image fusion strategy for high-level vision tasks, which includes two parallel semantic segmentation branches with refined feature adaptive-modulation (RFaM) module and multi-level representation-adaptive fusion (MRaF) module. More details of weights analysis are shown in Fig. 5.
  • Figure 4: The architecture of the refined feature adaptive-modulation (RFaM). More details of weights analysis are shown in Fig. 5.
  • Figure 5: Significant semantic information analysis of pilot experiment.
  • ...and 11 more figures