Reasoning: From Reflection to Solution
Zixi Li
TL;DR
The paper investigates whether current AI systems truly reason or merely pattern-match by proposing reasoning as iterative operator application in state spaces that converge to fixed points. It introduces OpenXOR to isolate systematic search, OpenOperator to unify DP, graph algorithms, and search under a fixed-point framework, and OpenLM to learn operator policies with explicit state representations. OpenXOR reveals exponential search difficulty and that autoregressive LLMs yield 0% task completion, while OpenLM attains 76% exact accuracy, demonstrating that neural networks can learn systematic reasoning when architectural inductive biases align with problem structure. The results argue for neural-symbolic hybrids and architecturally diverse AI, with implications for benchmark design, algorithm design, and real-world constraint-satisfaction tasks. Overall, the work shifts the focus from scaling autoregressive models to matching computational structures to the cognitive tasks at hand, suggesting a practical path toward robust reasoning systems.
Abstract
What is reasoning? This question has driven centuries of philosophical inquiry, from Aristotle's syllogisms to modern computational complexity theory. In the age of large language models achieving superhuman performance on benchmarks like GSM8K (95\% accuracy) and HumanEval (90\% pass@1), we must ask: have these systems learned to \emph{reason}, or have they learned to \emph{pattern-match over reasoning traces}? This paper argues for a specific answer: \textbf{reasoning is iterative operator application in state spaces, converging to fixed points}. This definition is not merely philosophical -- it has concrete architectural implications that explain both the failures of current systems and the path to genuine reasoning capabilities. Our investigation begins with a puzzle (OpenXOR), progresses through theory (OpenOperator), and culminates in a working solution (OpenLM) that achieves 76\% accuracy where state-of-the-art LLMs achieve 0\%. This is not about criticizing existing systems, but about \emph{understanding what reasoning requires} and \emph{building architectures that provide it}.
