Table of Contents
Fetching ...

DGMamba: Domain Generalization via Generalized State Space Model

Shaocong Long, Qianyu Zhou, Xiangtai Li, Xuequan Lu, Chenhao Ying, Yuan Luo, Lizhuang Ma, Shuicheng Yan

TL;DR

This paper tackles domain generalization by adapting a vision State Space Model (Mamba) to DG, addressing hidden-state leakage and nonideal image scanning. It introduces Hidden State Suppressing (HSS) to curb domain-specific information in hidden states and Semantic-aware Patch Refining (SPR), comprising Prior-Free Scanning (PFS) and Domain Context Interchange (DCI), to emphasize object cues and diversify context. Across five DG benchmarks, DGMamba achieves state-of-the-art generalization with competitive efficiency, demonstrating the viability of SSM-based models for robust cross-domain vision. The work provides a solid baseline for applying State Space Models to DG and suggests future directions like domain prompts to further enhance generalization.

Abstract

Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can hardly be applied to DG to address distribution shifts, due to the hidden state issues and inappropriate scan mechanisms. In this paper, we propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains and meanwhile has the advantages of global receptive fields, and efficient linear complexity. Our DGMamba compromises two core components: Hidden State Suppressing~(HSS) and Semantic-aware Patch refining~(SPR). In particular, HSS is introduced to mitigate the influence of hidden states associated with domain-specific features during output prediction. SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI). Concretely, PFS aims to shuffle the non-semantic patches within images, creating more flexible and effective sequences from images, and DCI is designed to regularize Mamba with the combination of mismatched non-semantic and semantic information by fusing patches among domains. Extensive experiments on five commonly used DG benchmarks demonstrate that the proposed DGMamba achieves remarkably superior results to state-of-the-art models. The code will be made publicly available at https://github.com/longshaocong/DGMamba.

DGMamba: Domain Generalization via Generalized State Space Model

TL;DR

This paper tackles domain generalization by adapting a vision State Space Model (Mamba) to DG, addressing hidden-state leakage and nonideal image scanning. It introduces Hidden State Suppressing (HSS) to curb domain-specific information in hidden states and Semantic-aware Patch Refining (SPR), comprising Prior-Free Scanning (PFS) and Domain Context Interchange (DCI), to emphasize object cues and diversify context. Across five DG benchmarks, DGMamba achieves state-of-the-art generalization with competitive efficiency, demonstrating the viability of SSM-based models for robust cross-domain vision. The work provides a solid baseline for applying State Space Models to DG and suggests future directions like domain prompts to further enhance generalization.

Abstract

Domain generalization~(DG) aims at solving distribution shift problems in various scenes. Existing approaches are based on Convolution Neural Networks (CNNs) or Vision Transformers (ViTs), which suffer from limited receptive fields or quadratic complexities issues. Mamba, as an emerging state space model (SSM), possesses superior linear complexity and global receptive fields. Despite this, it can hardly be applied to DG to address distribution shifts, due to the hidden state issues and inappropriate scan mechanisms. In this paper, we propose a novel framework for DG, named DGMamba, that excels in strong generalizability toward unseen domains and meanwhile has the advantages of global receptive fields, and efficient linear complexity. Our DGMamba compromises two core components: Hidden State Suppressing~(HSS) and Semantic-aware Patch refining~(SPR). In particular, HSS is introduced to mitigate the influence of hidden states associated with domain-specific features during output prediction. SPR strives to encourage the model to concentrate more on objects rather than context, consisting of two designs: Prior-Free Scanning~(PFS), and Domain Context Interchange~(DCI). Concretely, PFS aims to shuffle the non-semantic patches within images, creating more flexible and effective sequences from images, and DCI is designed to regularize Mamba with the combination of mismatched non-semantic and semantic information by fusing patches among domains. Extensive experiments on five commonly used DG benchmarks demonstrate that the proposed DGMamba achieves remarkably superior results to state-of-the-art models. The code will be made publicly available at https://github.com/longshaocong/DGMamba.
Paper Structure (15 sections, 4 equations, 7 figures, 12 tables)

This paper contains 15 sections, 4 equations, 7 figures, 12 tables.

Figures (7)

  • Figure 1: Comparison of current CNN-based methods, ViT-based methods, and our proposed DGMamba on PACS and OfficeHome benchmark of DG. Compared with these state-of-the-art (SOTA) methods, our proposed approach achieves the best trade-off between the generalization performance (Accuracy) and computational complexity (Number of parameters).
  • Figure 2: (a) When directly adapting VMamba to DG, domain-specific information captured by hidden states may be accumulated or even amplified during the hidden state propagation, which will impede the generalization performance. (b) In contrast, the Hidden State Suppressing (HSS) strategy is introduced in our DGMamba to alleviate the adverse effect of domain-specific information contained in hidden states. (c) Simple and fixed strategies of VMamba may result in unexpected domain-specific information within the generated sequence data when scanning 2D images into a 1D sequence, thereby undermining the ability of Mamba to address distribution shifts. (d) In contrast, the proposed Prior-Free Scanning in DGMamba endeavors to break the prior bias introduced by the fixed manual flattening, offering more meaningful sequence data.
  • Figure 3: The framework of our proposed DGMamba. Before passing the patches into the state space layer of Mamba, the Semantic-aware Patch Refining (SPR) is employed. Concretely, for the samples not in the top percentage of prediction confidence, we apply the Prior-Free Scanning strategy to randomly shuffle the background patches that exhibit low Grad-CAM scores, providing Mamba with a more flexible and effective 2D scanning mechanism. For the remaining samples, we substitute their background patches with the context patches from diverse domains, introducing texture noise and context confusion to avoid overfitting. In addition, we employ Hidden State Suppressing (HSS) to reduce the importance of hidden states that comprise domain-specific information.
  • Figure 4: Effect of $\alpha$ in the proposed Hidden State Suppressing on PACS.
  • Figure 5: Visualizations with t-SNE embeddings van2008visualizing illustrating various classes' representations produced by (a) iDAG huang2023idag, (b) GMoE li2023sparse, (c) VMamba liu2024VMamba, and (d) DGMamba (ours), respectively. DGMamba demonstrates the superior clustering effect. Zoom in for details.
  • ...and 2 more figures