Table of Contents
Fetching ...

Reasoning: From Reflection to Solution

Zixi Li

TL;DR

The paper investigates whether current AI systems truly reason or merely pattern-match by proposing reasoning as iterative operator application in state spaces that converge to fixed points. It introduces OpenXOR to isolate systematic search, OpenOperator to unify DP, graph algorithms, and search under a fixed-point framework, and OpenLM to learn operator policies with explicit state representations. OpenXOR reveals exponential search difficulty and that autoregressive LLMs yield 0% task completion, while OpenLM attains 76% exact accuracy, demonstrating that neural networks can learn systematic reasoning when architectural inductive biases align with problem structure. The results argue for neural-symbolic hybrids and architecturally diverse AI, with implications for benchmark design, algorithm design, and real-world constraint-satisfaction tasks. Overall, the work shifts the focus from scaling autoregressive models to matching computational structures to the cognitive tasks at hand, suggesting a practical path toward robust reasoning systems.

Abstract

What is reasoning? This question has driven centuries of philosophical inquiry, from Aristotle's syllogisms to modern computational complexity theory. In the age of large language models achieving superhuman performance on benchmarks like GSM8K (95\% accuracy) and HumanEval (90\% pass@1), we must ask: have these systems learned to \emph{reason}, or have they learned to \emph{pattern-match over reasoning traces}? This paper argues for a specific answer: \textbf{reasoning is iterative operator application in state spaces, converging to fixed points}. This definition is not merely philosophical -- it has concrete architectural implications that explain both the failures of current systems and the path to genuine reasoning capabilities. Our investigation begins with a puzzle (OpenXOR), progresses through theory (OpenOperator), and culminates in a working solution (OpenLM) that achieves 76\% accuracy where state-of-the-art LLMs achieve 0\%. This is not about criticizing existing systems, but about \emph{understanding what reasoning requires} and \emph{building architectures that provide it}.

Reasoning: From Reflection to Solution

TL;DR

The paper investigates whether current AI systems truly reason or merely pattern-match by proposing reasoning as iterative operator application in state spaces that converge to fixed points. It introduces OpenXOR to isolate systematic search, OpenOperator to unify DP, graph algorithms, and search under a fixed-point framework, and OpenLM to learn operator policies with explicit state representations. OpenXOR reveals exponential search difficulty and that autoregressive LLMs yield 0% task completion, while OpenLM attains 76% exact accuracy, demonstrating that neural networks can learn systematic reasoning when architectural inductive biases align with problem structure. The results argue for neural-symbolic hybrids and architecturally diverse AI, with implications for benchmark design, algorithm design, and real-world constraint-satisfaction tasks. Overall, the work shifts the focus from scaling autoregressive models to matching computational structures to the cognitive tasks at hand, suggesting a practical path toward robust reasoning systems.

Abstract

What is reasoning? This question has driven centuries of philosophical inquiry, from Aristotle's syllogisms to modern computational complexity theory. In the age of large language models achieving superhuman performance on benchmarks like GSM8K (95\% accuracy) and HumanEval (90\% pass@1), we must ask: have these systems learned to \emph{reason}, or have they learned to \emph{pattern-match over reasoning traces}? This paper argues for a specific answer: \textbf{reasoning is iterative operator application in state spaces, converging to fixed points}. This definition is not merely philosophical -- it has concrete architectural implications that explain both the failures of current systems and the path to genuine reasoning capabilities. Our investigation begins with a puzzle (OpenXOR), progresses through theory (OpenOperator), and culminates in a working solution (OpenLM) that achieves 76\% accuracy where state-of-the-art LLMs achieve 0\%. This is not about criticizing existing systems, but about \emph{understanding what reasoning requires} and \emph{building architectures that provide it}.

Paper Structure

This paper contains 88 sections, 10 theorems, 17 equations, 5 figures, 5 tables, 3 algorithms.

Key Result

Theorem 1

The decision problem "Does there exist a valid solution $\mathbf{o}$ for instance $(\mathbf{b}, t, \mathcal{C})$?" is NP-hard.

Figures (5)

  • Figure 1: OpenXOR execution trace for a 7-bit sequence. Checkpoint at position 4 requires $\text{acc}_4 = 1$. The solution satisfies this constraint and reaches target $t=1$.
  • Figure 2: Three perspectives explaining why LLMs cannot solve OpenXOR.
  • Figure 3: OpenLM architecture: Iterative operator application until convergence.
  • Figure 4: The 76% vs 0% result: OpenLM bridges the gap between categorical LLM failure (0%) and perfect symbolic solutions (100%). This proves that neural networks can learn systematic reasoning when provided with operator-based architectures aligned to problem structure.
  • Figure 5: Task completion rate comparison. LLMs achieve 0% completion (cannot produce valid outputs), while random/greedy baselines and backtracking all complete 100% of instances. This is worse than random guessing---models refuse or crash rather than attempt solutions.

Theorems & Definitions (21)

  • Definition 1: OpenXOR Instance
  • Definition 2: Valid Solution
  • Theorem 1: OpenXOR is NP-Hard
  • proof : Proof Sketch
  • Proposition 2: Exponential Search Space
  • Proposition 3: Solution Density
  • proof : Intuition
  • Theorem 4: Random Strategy Lower Bound
  • proof
  • Corollary 5
  • ...and 11 more