Table of Contents
Fetching ...

DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object Detection

A. Enes Doruk, Hasan F. Ates

TL;DR

DA-Mamba introduces a domain-adaptive Mamba-Transformer framework for one-stage object detection that blends efficient state-space modeling with attention to capture long-range dependencies across domains. It employs domain-adaptive spatial/channel scanning, cross-attention for source/target dominance, and entropy-guided distillation and perturbation to align domain representations while suppressing noisy activations. The approach demonstrates strong cross-domain performance on Pascal VOC and synthetic cross-domain datasets, achieving state-of-the-art results with favorable computational efficiency. This work advances practical unsupervised domain adaptation for real-world detectors by balancing modeling power with deployment constraints.

Abstract

Recent 2D CNN-based domain adaptation approaches struggle with long-range dependencies due to limited receptive fields, making it difficult to adapt to target domains with significant spatial distribution changes. While transformer-based domain adaptation methods better capture distant relationships through self-attention mechanisms that facilitate more effective cross-domain feature alignment, their quadratic computational complexity makes practical deployment challenging for object detection tasks across diverse domains. Inspired by the global modeling and linear computation complexity of the Mamba architecture, we present the first domain-adaptive Mamba-based one-stage object detection model, termed DA-Mamba. Specifically, we combine Mamba's efficient state-space modeling with attention mechanisms to address domain-specific spatial and channel-wise variations. Our design leverages domain-adaptive spatial and channel-wise scanning within the Mamba block to extract highly transferable representations for efficient sequential processing, while cross-attention modules generate long-range, mixed-domain spatial features to enable robust soft alignment across domains. Besides, motivated by the observation that hybrid architectures introduce feature noise in domain adaptation tasks, we propose an entropy-based knowledge distillation framework with margin ReLU, which adaptively refines multi-level representations by suppressing irrelevant activations and aligning uncertainty across source and target domains. Finally, to prevent overfitting caused by the mixed-up features generated through cross-attention mechanisms, we propose entropy-driven gating attention with random perturbations that simultaneously refine target features and enhance model generalization.

DA-Mamba: Domain Adaptive Hybrid Mamba-Transformer Based One-Stage Object Detection

TL;DR

DA-Mamba introduces a domain-adaptive Mamba-Transformer framework for one-stage object detection that blends efficient state-space modeling with attention to capture long-range dependencies across domains. It employs domain-adaptive spatial/channel scanning, cross-attention for source/target dominance, and entropy-guided distillation and perturbation to align domain representations while suppressing noisy activations. The approach demonstrates strong cross-domain performance on Pascal VOC and synthetic cross-domain datasets, achieving state-of-the-art results with favorable computational efficiency. This work advances practical unsupervised domain adaptation for real-world detectors by balancing modeling power with deployment constraints.

Abstract

Recent 2D CNN-based domain adaptation approaches struggle with long-range dependencies due to limited receptive fields, making it difficult to adapt to target domains with significant spatial distribution changes. While transformer-based domain adaptation methods better capture distant relationships through self-attention mechanisms that facilitate more effective cross-domain feature alignment, their quadratic computational complexity makes practical deployment challenging for object detection tasks across diverse domains. Inspired by the global modeling and linear computation complexity of the Mamba architecture, we present the first domain-adaptive Mamba-based one-stage object detection model, termed DA-Mamba. Specifically, we combine Mamba's efficient state-space modeling with attention mechanisms to address domain-specific spatial and channel-wise variations. Our design leverages domain-adaptive spatial and channel-wise scanning within the Mamba block to extract highly transferable representations for efficient sequential processing, while cross-attention modules generate long-range, mixed-domain spatial features to enable robust soft alignment across domains. Besides, motivated by the observation that hybrid architectures introduce feature noise in domain adaptation tasks, we propose an entropy-based knowledge distillation framework with margin ReLU, which adaptively refines multi-level representations by suppressing irrelevant activations and aligning uncertainty across source and target domains. Finally, to prevent overfitting caused by the mixed-up features generated through cross-attention mechanisms, we propose entropy-driven gating attention with random perturbations that simultaneously refine target features and enhance model generalization.

Paper Structure

This paper contains 18 sections, 7 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overall DA-Mamba architechture.
  • Figure 2: Overview of the Hybrid Domain-Adaptive Mamba-Transformer architecture, showing the flow of source, source-dominant, target-dominant, and target features across the stages.
  • Figure 3: Structure of the Mamba block, showing the Spatial SSM, Channel SSM.
  • Figure 4: Detection results on Clipart1K ((a), (b)), Comic2K ((c), (d)) and Watercolor2K ((e), (f)) by Source Only liu2016ssd, I3Net chen2021i3net and DA-Mamba-B (Ours).
  • Figure 5: Total entropy of shallow, mid, and deep feature levels measured on adaptation from Pascal VOC to Clipart1k (%) using the proposed DA-Mamba-B model.