Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and Reasoning
Zheng Zhang
TL;DR
The paper investigates why large language models exhibit strong explanations of symbolic procedures yet fail to reliably execute them. It formalizes the computational split-brain syndrome, driven by three architectural constraints: contextual averaging that impedes domain binding, FFNs that resort to pattern storage rather than exact symbolic computation, and a decoupled instruction-execution pathway caused by next-token prediction. Through embedding analyses, layer-wise computation tracking, and cross-domain experiments in arithmetic and relational reasoning, it shows that these constraints persist across model families and scales, and that compensatory strategies (self-scaffolding, tool delegation, and hybrid architectures) only shift the bottleneck without resolving it. The work argues for architectural innovations—metacognitive control, lifted representations, and grounded execution—to achieve robust symbolic reasoning, and provides testable predictions and implications for interpretability research. Overall, it reframes LLM capabilities as pattern completion strengths that do not generalize to principled computation, calling for fundamental design changes to reach true generalizable intelligence.
Abstract
Large Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structural diagnosis of such failures, revealing a persistent gap between \textit{comprehension} and \textit{competence}. Through controlled experiments and architectural analysis, we demonstrate that LLMs often articulate correct principles without reliably applying them--a failure rooted not in knowledge access, but in computational execution. We term this phenomenon the computational \textit{split-brain syndrome}, where instruction and action pathways are geometrically and functionally dissociated. This core limitation recurs across domains, from mathematical operations to relational inferences, and explains why model behavior remains brittle even under idealized prompting. We argue that LLMs function as powerful pattern completion engines, but lack the architectural scaffolding for principled, compositional reasoning. Our findings delineate the boundary of current LLM capabilities and motivate future models with metacognitive control, principle lifting, and structurally grounded execution. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles, and why the geometric separation between instruction and execution pathways suggests limitations in neural introspection and mechanistic analysis.
