Table of Contents
Fetching ...

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

Xilai Li, Xiaosong Li, Tao Ye, Xiaoqi Cheng, Wuyang Liu, Haishu Tan

TL;DR

The paper tackles MMIF in the presence of incomplete focus in visible imagery by proposing a focused integration framework that first decomposes inputs with a semi-sparsity filter (SSF) into structure and texture, then fuses texture using a multi-scale, focus-aware operator and structure using entropy- and multi-directional frequency-based weighting. A MFIF-MMIF dataset is generated by creating complementary blur masks and Gaussian blur to simulate defocus across modalities, enabling robust evaluation. Extensive experiments on the TNO, RoadScene, and PET-MRI datasets, along with downstream tasks such as object detection (YOLOv4) and depth estimation (MiDaS), demonstrate state-of-the-art performance and strong generalization. The method yields improved detail preservation, energy information retention, and robustness to challenging conditions, offering a practical MMIF solution and a new benchmark for defocus-aware fusion.

Abstract

Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal information within the same scene. To address this issue, in this paper, we propose a MMIF framework for joint focused integration and modalities information extraction. Specifically, a semi-sparsity-based smoothing filter is introduced to decompose the images into structure and texture components. Subsequently, a novel multi-scale operator is proposed to fuse the texture components, capable of detecting significant information by considering the pixel focus attributes and relevant data from various modal images. Additionally, to achieve an effective capture of scene luminance and reasonable contrast maintenance, we consider the distribution of energy information in the structural components in terms of multi-directional frequency variance and information entropy. Extensive experiments on existing MMIF datasets, as well as the object detection and depth estimation tasks, consistently demonstrate that the proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation. The code is available at https://github.com/ixilai/MFIF-MMIF.

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

TL;DR

The paper tackles MMIF in the presence of incomplete focus in visible imagery by proposing a focused integration framework that first decomposes inputs with a semi-sparsity filter (SSF) into structure and texture, then fuses texture using a multi-scale, focus-aware operator and structure using entropy- and multi-directional frequency-based weighting. A MFIF-MMIF dataset is generated by creating complementary blur masks and Gaussian blur to simulate defocus across modalities, enabling robust evaluation. Extensive experiments on the TNO, RoadScene, and PET-MRI datasets, along with downstream tasks such as object detection (YOLOv4) and depth estimation (MiDaS), demonstrate state-of-the-art performance and strong generalization. The method yields improved detail preservation, energy information retention, and robustness to challenging conditions, offering a practical MMIF solution and a new benchmark for defocus-aware fusion.

Abstract

Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal information within the same scene. To address this issue, in this paper, we propose a MMIF framework for joint focused integration and modalities information extraction. Specifically, a semi-sparsity-based smoothing filter is introduced to decompose the images into structure and texture components. Subsequently, a novel multi-scale operator is proposed to fuse the texture components, capable of detecting significant information by considering the pixel focus attributes and relevant data from various modal images. Additionally, to achieve an effective capture of scene luminance and reasonable contrast maintenance, we consider the distribution of energy information in the structural components in terms of multi-directional frequency variance and information entropy. Extensive experiments on existing MMIF datasets, as well as the object detection and depth estimation tasks, consistently demonstrate that the proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation. The code is available at https://github.com/ixilai/MFIF-MMIF.
Paper Structure (21 sections, 15 equations, 9 figures, 3 tables)

This paper contains 21 sections, 15 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Application of multi-modal images to object detection and depth estimation tasks.
  • Figure 2: The flowchart of the proposed method.
  • Figure 3: Examples of binary mask pairs.
  • Figure 4: The quantitative comparison results of the proposed algorithm for different values of the parameter $N$.
  • Figure 5: Ablation study of fusion rules for structural layers. The figure shows the salient target detection results corresponding to different source images and fusion results.
  • ...and 4 more figures