Barriers to Discrete Reasoning with Transformers: A Survey Across Depth, Exactness, and Bandwidth
Michelle Yuan, Weiyi Sun, Amir H. Rezaeian, Jyotika Singh, Sandip Ghoshal, Yao-Ting Wang, Miguel Ballesteros, Yassine Benajiba
TL;DR
Transformers excel at interpolation but struggle with discrete, exact reasoning due to fundamental architectural constraints. By integrating circuit complexity, approximation theory, and communication complexity, the paper provides a unified account of why fixed-depth, finite-precision self-attention limits exact sequential computation, long-range coordination, and sharp decision boundaries, even as chain-of-thought prompts offer partial relief. The survey highlights interdependent barriers tied to depth, precision, and bandwidth, and discusses promising directions—such as neuro-symbolic hybrids and memory-augmented designs—that aim to overcome these limits. Together, these insights establish a principled foundation for designing reasoning-capable models that generalize robustly to multi-step, rule-based tasks.
Abstract
Transformers have become the foundational architecture for a broad spectrum of sequence modeling applications, underpinning state-of-the-art systems in natural language processing, vision, and beyond. However, their theoretical limitations in discrete reasoning tasks, such as arithmetic, logical inference, and algorithmic composition, remain a critical open problem. In this survey, we synthesize recent studies from three theoretical perspectives: circuit complexity, approximation theory, and communication complexity, to clarify the structural and computational barriers that transformers face when performing symbolic computations. By connecting these established theoretical frameworks, we provide an accessible and unified account of why current transformer architectures struggle to implement exact discrete algorithms, even as they excel at pattern matching and interpolation. We review key definitions, seminal results, and illustrative examples, highlighting challenges such as depth constraints, difficulty approximating discontinuities, and bottlenecks in inter-token communication. Finally, we discuss implications for model design and suggest promising directions for overcoming these foundational limitations.
