Table of Contents
Fetching ...

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection

Haochen Li, Rui Zhang, Hantao Yao, Xin Zhang, Yifan Hao, Shaohui Peng, Yongwei Zhao, Ling Li

Abstract

Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant features. Although transformer-based DAOD methods capture global dependencies via attention mechanisms, their quadratic computational cost hinders practical deployment. To solve this, we propose DA-Mamba, a hybrid CNN-State Space Models (SSMs) architecture that combines the efficiency of CNNs with the linear-time long-range modeling capability of State Space Models (SSMs) to capture both global and local domain-invariant features. Specifically, we introduce two novel modules: Image-Aware SSM (IA-SSM) and Object-Aware SSM (OA-SSM). IA-SSM is integrated into the backbone to enhance global domain awareness, enabling image-level global and local alignment. OA-SSM is inserted into the detection head to model spatial and semantic dependencies among objects, enhancing instance-level alignment. Comprehensive experiments demonstrate that the proposed method can efficiently improve the cross-domain performance of the object detector.

DA-Mamba: Learning Domain-Aware State Space Model for Global-Local Alignment in Domain Adaptive Object Detection

Abstract

Domain Adaptive Object Detection (DAOD) aims to transfer detectors from a labeled source domain to an unlabeled target domain. Existing DAOD methods employ multi-granularity feature alignment to learn domain-invariant representations. However, the local connectivity of their CNN-based backbone and detection head restricts alignment to local regions, failing to extract global domain-invariant features. Although transformer-based DAOD methods capture global dependencies via attention mechanisms, their quadratic computational cost hinders practical deployment. To solve this, we propose DA-Mamba, a hybrid CNN-State Space Models (SSMs) architecture that combines the efficiency of CNNs with the linear-time long-range modeling capability of State Space Models (SSMs) to capture both global and local domain-invariant features. Specifically, we introduce two novel modules: Image-Aware SSM (IA-SSM) and Object-Aware SSM (OA-SSM). IA-SSM is integrated into the backbone to enhance global domain awareness, enabling image-level global and local alignment. OA-SSM is inserted into the detection head to model spatial and semantic dependencies among objects, enhancing instance-level alignment. Comprehensive experiments demonstrate that the proposed method can efficiently improve the cross-domain performance of the object detector.
Paper Structure (24 sections, 22 equations, 9 figures, 13 tables)

This paper contains 24 sections, 22 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: (a) Existing DAOD methods focus on aligning features at image and instance-level to extract domain-invariant features. However, they only achieve local alignment due to the locality of CNN-based backbone and head, resulting in insufficient alignment across regions. (b) Our DA-Mamba utilizes SSM's long-range perception capability to extract global domain-invariant features, introducing the Image-Aware SSM module to the backbone to supplement global domain attributes, and the Object-Aware SSM module to the detection head to model spatial and semantic dependencies between objects, achieving fine-grained alignment at image and instance-level.
  • Figure 2: Overview of the proposed DA-Mamba. The proposed IA-SSM and OA-SSM are integrated into the FPN of the backbone and detection head, respectively, providing fine-grained image and instance-level global-local alignment.
  • Figure 3: The architecture of (a) Mamba layer Mamba (b) Image-Aware SSM module and (c) Object-Aware SSM module.
  • Figure 4: Different structure for IA-SSM and OA-SSM.
  • Figure 5: Visualizations of the extracted feature map.
  • ...and 4 more figures