Table of Contents
Fetching ...

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubić, Federico Soldá, Aurelio Sulser, Davide Scaramuzza

TL;DR

This paper analyzes the fundamental limits of Structured State Space Models (SSMs) and Transformers in sequence modeling, focusing on function composition and multi-step reasoning. By framing the problem in complexity-theoretic terms, the authors prove that one-layer SSMs require impractically large state sizes to perform function composition over large domains, and that even Chain-of-Thought prompting yields only a polynomial growth in required steps; multi-layer SSMs are bounded by logarithmic-space constraints, placing them in the class $\mathbf{L}$ and aligning with limitations known for Transformers. They further show that finite-precision SSMs can only recognize regular languages, reinforcing inherent restrictions on memory and expressivity. Empirically, the paper demonstrates significant degradation on composition tasks across multiple models and prompting regimes, including GPT-4o, and reveals systematic error propagation and shortcut behaviors that hinder reliable multi-step reasoning. Together, the theoretical and empirical results argue for novel architectures or hybrid approaches (e.g., neuro-symbolic or external-memory mechanisms) to transcend current computational barriers in pursuit of more general artificial intelligence.

Abstract

Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Also, the language of a finite-precision SSM is within the class of regular languages. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with advanced prompting techniques. Models often resort to shortcuts, leading to compounding errors. These findings highlight fundamental barriers within current deep learning architectures rooted in their computational capacities. We underscore the need for innovative solutions to transcend these constraints and achieve reliable multi-step reasoning and compositional task-solving, which is critical for advancing toward general artificial intelligence.

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

TL;DR

This paper analyzes the fundamental limits of Structured State Space Models (SSMs) and Transformers in sequence modeling, focusing on function composition and multi-step reasoning. By framing the problem in complexity-theoretic terms, the authors prove that one-layer SSMs require impractically large state sizes to perform function composition over large domains, and that even Chain-of-Thought prompting yields only a polynomial growth in required steps; multi-layer SSMs are bounded by logarithmic-space constraints, placing them in the class and aligning with limitations known for Transformers. They further show that finite-precision SSMs can only recognize regular languages, reinforcing inherent restrictions on memory and expressivity. Empirically, the paper demonstrates significant degradation on composition tasks across multiple models and prompting regimes, including GPT-4o, and reveals systematic error propagation and shortcut behaviors that hinder reliable multi-step reasoning. Together, the theoretical and empirical results argue for novel architectures or hybrid approaches (e.g., neuro-symbolic or external-memory mechanisms) to transcend current computational barriers in pursuit of more general artificial intelligence.

Abstract

Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Also, the language of a finite-precision SSM is within the class of regular languages. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with advanced prompting techniques. Models often resort to shortcuts, leading to compounding errors. These findings highlight fundamental barriers within current deep learning architectures rooted in their computational capacities. We underscore the need for innovative solutions to transcend these constraints and achieve reliable multi-step reasoning and compositional task-solving, which is critical for advancing toward general artificial intelligence.
Paper Structure (44 sections, 6 theorems, 11 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 44 sections, 6 theorems, 11 equations, 18 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Consider a function composition problem with input domain size $|A| = |B| = n$ and an SSM layer $\mathcal{L}$ with embedding dimension $d$ and computation precision $p$. Let $R = n\log n - (d^2 + d)p \geq 0$, then the probability that $\mathcal{L}$ answers the query incorrectly is at least $R/(3n\lo

Figures (18)

  • Figure 1: Qualitative example of zero-shot inference on prominent SSM and Attention-based models. None of the models successfully resolved the problems across any of the composition axes.
  • Figure 2: Jamba lieber_arxiv_2024 performance on multiplication, DP and puzzle tasks. For DP various models are shown. All struggle with compositional tasks, especially for larger input size.
  • Figure 3: Multiply two numbers
  • Figure 3: Model Accuracy for PEN task
  • Figure 4: Error Propagation. Carry operation outputs number 3 instead of 2 from node '27', and that error is further propagated, yielding incorrect solution in the middle digit, although all other steps were done right.
  • ...and 13 more figures

Theorems & Definitions (12)

  • Definition 1: SSM layer
  • Theorem 1
  • Lemma 1: Lemma 1 from peng2024limitations
  • proof : Proof of Theorem \ref{['theorem:1']}
  • Definition 2: SSM with CoT
  • Theorem 2
  • Lemma 2: Theorem 1.1 Yehudayoff_2020
  • proof : Proof of Theorem \ref{['thm-CoT']}
  • Theorem 3
  • Definition 3
  • ...and 2 more