Table of Contents
Fetching ...

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Youjin Wang, Jiaqiao Zhao, Rong Fu, Run Zhou, Ruizhe Zhang, Jiani Liang, Suisuai Cao, Feng Zhou

Abstract

Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Abstract

Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.
Paper Structure (37 sections, 13 equations, 4 figures, 6 tables)

This paper contains 37 sections, 13 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Overview of InfoMamba. The architecture couples a concept-bottleneck global filtering path with a selective recurrent SSM path via IMF, guided by a redundancy-reduction objective.
  • Figure 2: Router preference under different dependency ranges. The dynamic router $\rho_t$ assigns more weight to the Transformer path on short-range tasks and to the Mamba path as dependencies extend to mid- and long-range.
  • Figure 3: Overview of InfoMamba. The architecture couples a concept-bottleneck global filtering path with a selective recurrent SSM path via IMF, guided by a redundancy-reduction objective.
  • Figure 4: Effect of the upper-bound concept pool size $k_{\max}$ on accuracy, latency, and throughput. Unless otherwise noted, we set $k_{\max}{=}100$; the effective concept count $k_{\mathrm{eff}}(X)$ is selected dynamically per input by the information-driven sparsification in §\ref{['sec:soft-bucket']}.