Table of Contents
Fetching ...

Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion

Ke Cao, Xuanhua He, Tao Hu, Chengjun Xie, Jie Zhang, Man Zhou, Danfeng Hong

TL;DR

A novel Bayesian-inspired scanning strategy called Random Shuffle is proposed, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning.

Abstract

Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images. Although State-Space Models, such as Mamba, are proficient in long-range modeling with linear complexity, most Mamba-based approaches use fixed scanning strategies, which can introduce biased prior information. To mitigate this issue, we propose a novel Bayesian-inspired scanning strategy called Random Shuffle, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning. Based on this transformation pair, we customized the Shuffle Mamba Framework, penetrating modality-aware information representation and cross-modality information interaction across spatial and channel axes to ensure robust interaction and an unbiased global receptive field for multi-modal image fusion. Furthermore, we develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results. Extensive experiments across multiple multi-modal image fusion tasks demonstrate the effectiveness of our proposed method, yielding excellent fusion quality over state-of-the-art alternatives. Code will be available upon acceptance.

Shuffle Mamba: State Space Models with Random Shuffle for Multi-Modal Image Fusion

TL;DR

A novel Bayesian-inspired scanning strategy called Random Shuffle is proposed, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning.

Abstract

Multi-modal image fusion integrates complementary information from different modalities to produce enhanced and informative images. Although State-Space Models, such as Mamba, are proficient in long-range modeling with linear complexity, most Mamba-based approaches use fixed scanning strategies, which can introduce biased prior information. To mitigate this issue, we propose a novel Bayesian-inspired scanning strategy called Random Shuffle, supplemented by an theoretically-feasible inverse shuffle to maintain information coordination invariance, aiming to eliminate biases associated with fixed sequence scanning. Based on this transformation pair, we customized the Shuffle Mamba Framework, penetrating modality-aware information representation and cross-modality information interaction across spatial and channel axes to ensure robust interaction and an unbiased global receptive field for multi-modal image fusion. Furthermore, we develop a testing methodology based on Monte-Carlo averaging to ensure the model's output aligns more closely with expected results. Extensive experiments across multiple multi-modal image fusion tasks demonstrate the effectiveness of our proposed method, yielding excellent fusion quality over state-of-the-art alternatives. Code will be available upon acceptance.
Paper Structure (21 sections, 10 equations, 7 figures, 4 tables)

This paper contains 21 sections, 10 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Visualization of Effective Receptive Fields (ERFs) for Conv, Self-Attention and our Method. A larger ERF is indicated by a more extensively distributed dark area.
  • Figure 2: The architecture of the proposed Shuffle Mamba framework.
  • Figure 3: The Random Shuffle Scanning for training.
  • Figure 4: The Monte-Carlo averaging for testing.
  • Figure 5: Comparative visual experiments of several methods on WV3 datasets
  • ...and 2 more figures