InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Youjin Wang; Jiaqiao Zhao; Rong Fu; Run Zhou; Ruizhe Zhang; Jiani Liang; Suisuai Cao; Feng Zhou

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Youjin Wang, Jiaqiao Zhao, Rong Fu, Run Zhou, Ruizhe Zhang, Jiani Liang, Suisuai Cao, Feng Zhou

Abstract

Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF dynamically injects global context into the SSM dynamics and encourages complementary information usage through a mutual-information-inspired objective. Extensive experiments on classification, dense prediction, and non-vision tasks show that InfoMamba consistently outperforms strong Transformer and SSM baselines, achieving competitive accuracy-efficiency trade-offs while maintaining near-linear scaling.

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Abstract

Paper Structure (37 sections, 13 equations, 4 figures, 6 tables)

This paper contains 37 sections, 13 equations, 4 figures, 6 tables.

Introduction
Related Work
Selective State-Space Models
Transformers and Efficient Token Mixing
Hybrid Methods
Preliminaries and Motivation
Problem Formulation
Transformer Mixing.
SSM Recurrence.
Aligned Regimes.
Theoretical Analysis: Consistency Boundary
Consistency Conditions.
Pole Invariance.
Boundary 1 (Consistency).
Boundaries 2--3 (Inconsistency).
...and 22 more sections

Figures (4)

Figure 1: Overview of InfoMamba. The architecture couples a concept-bottleneck global filtering path with a selective recurrent SSM path via IMF, guided by a redundancy-reduction objective.
Figure 2: Router preference under different dependency ranges. The dynamic router $\rho_t$ assigns more weight to the Transformer path on short-range tasks and to the Mamba path as dependencies extend to mid- and long-range.
Figure 3: Overview of InfoMamba. The architecture couples a concept-bottleneck global filtering path with a selective recurrent SSM path via IMF, guided by a redundancy-reduction objective.
Figure 4: Effect of the upper-bound concept pool size $k_{\max}$ on accuracy, latency, and throughput. Unless otherwise noted, we set $k_{\max}{=}100$; the effective concept count $k_{\mathrm{eff}}(X)$ is selected dynamically per input by the information-driven sparsification in §\ref{['sec:soft-bucket']}.

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Abstract

InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model

Authors

Abstract

Table of Contents

Figures (4)