Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubić; Federico Soldá; Aurelio Sulser; Davide Scaramuzza

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Nikola Zubić, Federico Soldá, Aurelio Sulser, Davide Scaramuzza

TL;DR

This paper analyzes the fundamental limits of Structured State Space Models (SSMs) and Transformers in sequence modeling, focusing on function composition and multi-step reasoning. By framing the problem in complexity-theoretic terms, the authors prove that one-layer SSMs require impractically large state sizes to perform function composition over large domains, and that even Chain-of-Thought prompting yields only a polynomial growth in required steps; multi-layer SSMs are bounded by logarithmic-space constraints, placing them in the class $\mathbf{L}$ and aligning with limitations known for Transformers. They further show that finite-precision SSMs can only recognize regular languages, reinforcing inherent restrictions on memory and expressivity. Empirically, the paper demonstrates significant degradation on composition tasks across multiple models and prompting regimes, including GPT-4o, and reveals systematic error propagation and shortcut behaviors that hinder reliable multi-step reasoning. Together, the theoretical and empirical results argue for novel architectures or hybrid approaches (e.g., neuro-symbolic or external-memory mechanisms) to transcend current computational barriers in pursuit of more general artificial intelligence.

Abstract

Despite their successes, deep learning models struggle with tasks requiring complex reasoning and function composition. We present a theoretical and empirical investigation into the limitations of Structured State Space Models (SSMs) and Transformers in such tasks. We prove that one-layer SSMs cannot efficiently perform function composition over large domains without impractically large state sizes, and even with Chain-of-Thought prompting, they require a number of steps that scale unfavorably with the complexity of the function composition. Also, the language of a finite-precision SSM is within the class of regular languages. Our experiments corroborate these theoretical findings. Evaluating models on tasks including various function composition settings, multi-digit multiplication, dynamic programming, and Einstein's puzzle, we find significant performance degradation even with advanced prompting techniques. Models often resort to shortcuts, leading to compounding errors. These findings highlight fundamental barriers within current deep learning architectures rooted in their computational capacities. We underscore the need for innovative solutions to transcend these constraints and achieve reliable multi-step reasoning and compositional task-solving, which is critical for advancing toward general artificial intelligence.

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

TL;DR

and aligning with limitations known for Transformers. They further show that finite-precision SSMs can only recognize regular languages, reinforcing inherent restrictions on memory and expressivity. Empirically, the paper demonstrates significant degradation on composition tasks across multiple models and prompting regimes, including GPT-4o, and reveals systematic error propagation and shortcut behaviors that hinder reliable multi-step reasoning. Together, the theoretical and empirical results argue for novel architectures or hybrid approaches (e.g., neuro-symbolic or external-memory mechanisms) to transcend current computational barriers in pursuit of more general artificial intelligence.

Abstract

Paper Structure (44 sections, 6 theorems, 11 equations, 18 figures, 4 tables, 1 algorithm)

This paper contains 44 sections, 6 theorems, 11 equations, 18 figures, 4 tables, 1 algorithm.

Introduction
Equivalence of SSMs with Other Deep Learning Models
Background
Function Composition Requires Wide One-Layer Models
Many Thought Steps are Needed
SSMs Are Limited to Regular Languages
Experiments
Related Work
Limitations in Function Composition and Reasoning
Chain-of-Thought Prompting
Expressive Power and Complexity of Neural Networks
Alternative Approaches to Complex Reasoning
Conclusion
Acknowledgment
Background on Communication Complexity and Computational Classes
...and 29 more sections

Key Result

Theorem 1

Consider a function composition problem with input domain size $|A| = |B| = n$ and an SSM layer $\mathcal{L}$ with embedding dimension $d$ and computation precision $p$. Let $R = n\log n - (d^2 + d)p \geq 0$, then the probability that $\mathcal{L}$ answers the query incorrectly is at least $R/(3n\lo

Figures (18)

Figure 1: Qualitative example of zero-shot inference on prominent SSM and Attention-based models. None of the models successfully resolved the problems across any of the composition axes.
Figure 2: Jamba lieber_arxiv_2024 performance on multiplication, DP and puzzle tasks. For DP various models are shown. All struggle with compositional tasks, especially for larger input size.
Figure 3: Multiply two numbers
Figure 3: Model Accuracy for PEN task
Figure 4: Error Propagation. Carry operation outputs number 3 instead of 2 from node '27', and that error is further propagated, yielding incorrect solution in the middle digit, although all other steps were done right.
...and 13 more figures

Theorems & Definitions (12)

Definition 1: SSM layer
Theorem 1
Lemma 1: Lemma 1 from peng2024limitations
proof : Proof of Theorem \ref{['theorem:1']}
Definition 2: SSM with CoT
Theorem 2
Lemma 2: Theorem 1.1 Yehudayoff_2020
proof : Proof of Theorem \ref{['thm-CoT']}
Theorem 3
Definition 3
...and 2 more

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

TL;DR

Abstract

Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (18)

Theorems & Definitions (12)