Spatial-Frequency Enhanced Mamba for Multi-Modal Image Fusion
Hui Sun, Long Lv, Pingping Zhang, Tongdan Tang, Feng Tian, Weibing Sun, Huchuan Lu
TL;DR
MMIF seeks to fuse complementary information from different modalities but suffers from limited receptive fields and high computational costs in existing CNN/Transformer approaches. The authors introduce SFMFusion, a three-branch MMIF framework that couples two image-reconstruction branches with a MMIF branch, and enhances feature extraction with Spatial-Frequency Enhanced Mamba blocks (MMB, CEB, FEB) plus Dynamic Fusion Mamba Blocks (DFMB) for adaptive fusion. Across six public MMIF datasets, SFMFusion achieves state-of-the-art or competitive performance, with improved content preservation and texture/detail fidelity due to IR guidance and spatial-frequency modeling. While effective, the method notes ghosting under misalignment and points to future work on joint registration and extending the approach to other fusion tasks.
Abstract
Multi-Modal Image Fusion (MMIF) aims to integrate complementary image information from different modalities to produce informative images. Previous deep learning-based MMIF methods generally adopt Convolutional Neural Networks (CNNs) or Transformers for feature extraction. However, these methods deliver unsatisfactory performances due to the limited receptive field of CNNs and the high computational cost of Transformers. Recently, Mamba has demonstrated a powerful potential for modeling long-range dependencies with linear complexity, providing a promising solution to MMIF. Unfortunately, Mamba lacks full spatial and frequency perceptions, which are very important for MMIF. Moreover, employing Image Reconstruction (IR) as an auxiliary task has been proven beneficial for MMIF. However, a primary challenge is how to leverage IR efficiently and effectively. To address the above issues, we propose a novel framework named Spatial-Frequency Enhanced Mamba Fusion (SFMFusion) for MMIF. More specifically, we first propose a three-branch structure to couple MMIF and IR, which can retain complete contents from source images. Then, we propose the Spatial-Frequency Enhanced Mamba Block (SFMB), which can enhance Mamba in both spatial and frequency domains for comprehensive feature extraction. Finally, we propose the Dynamic Fusion Mamba Block (DFMB), which can be deployed across different branches for dynamic feature fusion. Extensive experiments show that our method achieves better results than most state-of-the-art methods on six MMIF datasets. The source code is available at https://github.com/SunHui1216/SFMFusion.
