Table of Contents
Fetching ...

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

Yifang Chen, Xiaoyu Li, Yingyu Liang, Zhenmei Shi, Zhao Song

TL;DR

The paper investigates whether Mamba and State-space Models offer real computational advantages over Transformers by analyzing their place in circuit complexity. It shows that, with poly$(n)$-precision and constant-depth layers, Selective SSM and Mamba can be simulated in $DLOGTIME$-uniform $ extsf{TC}^0$, placing them on par with Transformers in expressive power. Consequently, unless $TC^0=NC^1$, these architectures cannot solve $NC^1$-hard problems such as arithmetic formula evaluation, Boolean formula value, or permutation composition, challenging claims of superior sequential reasoning. The authors provide constructive simulations of Selective SSM and Mamba in TC$^0$ and establish hardness results, supported by detailed analyses of logarithm approximation, recurrent/convolution SSMs, and selective mechanisms. Collectively, this work clarifies the theoretical limits of stateful neural architectures and motivates future design to surpass the $ extsf{TC}^0$ barrier for more complex, inherently sequential tasks.

Abstract

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and recent attention as a strong candidate to outperform Transformers, we have demonstrated that both Mamba and SSMs with $\mathrm{poly}(n)$-precision and constant-depth layers reside within the $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ complexity class. This result indicates Mamba has the same computational capabilities as Transformer theoretically, and it cannot solve problems like arithmetic formula problems, boolean formula value problems, and permutation composition problems if $\mathsf{TC}^0 \neq \mathsf{NC}^1$. Therefore, it challenges the assumption Mamba is more computationally expressive than Transformers. Our contributions include rigorous proofs showing that Selective SSM and Mamba architectures can be simulated by $\mathsf{DLOGTIME}$-uniform $\mathsf{TC}^0$ circuits, and they cannot solve problems outside $\mathsf{TC}^0$.

The Computational Limits of State-Space Models and Mamba via the Lens of Circuit Complexity

TL;DR

The paper investigates whether Mamba and State-space Models offer real computational advantages over Transformers by analyzing their place in circuit complexity. It shows that, with poly-precision and constant-depth layers, Selective SSM and Mamba can be simulated in -uniform , placing them on par with Transformers in expressive power. Consequently, unless , these architectures cannot solve -hard problems such as arithmetic formula evaluation, Boolean formula value, or permutation composition, challenging claims of superior sequential reasoning. The authors provide constructive simulations of Selective SSM and Mamba in TC and establish hardness results, supported by detailed analyses of logarithm approximation, recurrent/convolution SSMs, and selective mechanisms. Collectively, this work clarifies the theoretical limits of stateful neural architectures and motivates future design to surpass the barrier for more complex, inherently sequential tasks.

Abstract

In this paper, we analyze the computational limitations of Mamba and State-space Models (SSMs) by using the circuit complexity framework. Despite Mamba's stateful design and recent attention as a strong candidate to outperform Transformers, we have demonstrated that both Mamba and SSMs with -precision and constant-depth layers reside within the -uniform complexity class. This result indicates Mamba has the same computational capabilities as Transformer theoretically, and it cannot solve problems like arithmetic formula problems, boolean formula value problems, and permutation composition problems if . Therefore, it challenges the assumption Mamba is more computationally expressive than Transformers. Our contributions include rigorous proofs showing that Selective SSM and Mamba architectures can be simulated by -uniform circuits, and they cannot solve problems outside .

Paper Structure

This paper contains 44 sections, 33 theorems, 33 equations, 1 figure, 1 table.

Key Result

Lemma 3.12

Let $p \in \mathop{\mathrm{\mathbb{Z}}}\limits_+$. We have We use $d_{\mathrm{std}}$, $d_{\otimes}$, and $d_{\oplus}$ to denote the constant depth of the above three situations, respectively.

Figures (1)

  • Figure 1: Mamba Block Architecture. The input is first processed through two input projections. One branch flows through an input projection, followed by a 1-D convolution, a SiLU activation, and a Selective SSM block before reaching the Hadamard product (or activation). The other branch passes through an input projection directly to a SiLU activation and then converges at the same Hadamard product (or activation). Finally, the output of the Hadamard product is passed through the output projection.

Theorems & Definitions (94)

  • Definition 3.1: Boolean circuit, from Definition 6.1, On page 102 in ab09
  • Definition 3.2: Circuit family recognizes languages, from Definition 6.2, On page 103 in ab09
  • Definition 3.3: $\mathsf{NC}^i$ ab09
  • Definition 3.4: $\mathsf{AC}^i$ ab09
  • Definition 3.5: $\mathsf{TC}^i$ vol99
  • Remark 3.6
  • Definition 3.7: $\mathsf{P}$ ab09
  • Remark 3.9
  • Definition 3.10: $\mathsf{L}$-uniformity ab09
  • Definition 3.11: $\mathsf{DLOGTIME}$-uniformity in bi94
  • ...and 84 more