Table of Contents
Fetching ...

Algorithmic Thinking Theory

MohammadHossein Bateni, Vincent Cohen-Addad, Yuzhou Gu, Silvio Lattanzi, Simon Meierhans, Christopher Mohri

TL;DR

This work develops a formal theory of algorithmic thinking for reasoning with large language models by modeling a reasoning oracle and a context-dependent transfer function. It analyzes three core strategies—Branching, Genetic, and Random Sampling—under decaying and uniform models, establishing optimality results, monotonicity implications, and convergence rates. The framework captures how adding correct solutions to context can boost performance but may exhibit diminishing returns or detrimental correlations, guiding efficient design of iterative reasoning pipelines. The results provide rigorous benchmarks for designing and analyzing next-generation reasoning methods that synthesize information across multiple intermediate outputs rather than rely on single-shot accuracy.

Abstract

Large language models (LLMs) have proven to be highly effective for solving complex reasoning tasks. Surprisingly, their capabilities can often be improved by iterating on previously generated solutions. In this context, a reasoning plan for generating and combining a set of solutions can be thought of as an algorithm for reasoning using a probabilistic oracle. We introduce a theoretical framework for analyzing such reasoning algorithms. This framework formalizes the principles underlying popular techniques for iterative improvement and answer aggregation, providing a foundation for designing a new generation of more powerful reasoning methods. Unlike approaches for understanding models that rely on architectural specifics, our model is grounded in experimental evidence. As a result, it offers a general perspective that may extend to a wide range of current and future reasoning oracles.

Algorithmic Thinking Theory

TL;DR

This work develops a formal theory of algorithmic thinking for reasoning with large language models by modeling a reasoning oracle and a context-dependent transfer function. It analyzes three core strategies—Branching, Genetic, and Random Sampling—under decaying and uniform models, establishing optimality results, monotonicity implications, and convergence rates. The framework captures how adding correct solutions to context can boost performance but may exhibit diminishing returns or detrimental correlations, guiding efficient design of iterative reasoning pipelines. The results provide rigorous benchmarks for designing and analyzing next-generation reasoning methods that synthesize information across multiple intermediate outputs rather than rely on single-shot accuracy.

Abstract

Large language models (LLMs) have proven to be highly effective for solving complex reasoning tasks. Surprisingly, their capabilities can often be improved by iterating on previously generated solutions. In this context, a reasoning plan for generating and combining a set of solutions can be thought of as an algorithm for reasoning using a probabilistic oracle. We introduce a theoretical framework for analyzing such reasoning algorithms. This framework formalizes the principles underlying popular techniques for iterative improvement and answer aggregation, providing a foundation for designing a new generation of more powerful reasoning methods. Unlike approaches for understanding models that rely on architectural specifics, our model is grounded in experimental evidence. As a result, it offers a general perspective that may extend to a wide range of current and future reasoning oracles.

Paper Structure

This paper contains 43 sections, 21 theorems, 32 equations, 3 figures, 3 algorithms.

Key Result

Theorem 3.2

Suppose that $X_1, \ldots, X_n$ are i.i.d random variables taking values in $\{0, 1\}$. Let $X = \sum_{i = 1}^n X_i$ and $\mu = \mathop{{}\mathbb{E}}\left[X\right]$. Then

Figures (3)

  • Figure 1: Average AIME 2025 accuracy per question for Gemini 2.5 Pro. We use 780 model calls per question. The error bars represent standard error.
  • Figure 2: Gemini 2.5 Pro accuracy when providing $1$ correct solution answer and $k=0$ to $12$ incorrect solutions. The red line represents base accuracy of Pro on the question without any solutions. The green lines represent $1/(k+1)$, the accuracy when returning a solution in the context uniformly at random. The orange bar is the most likely configuration according to the base accuracy (the $k$ for which $1/(k+1)$ is closest to base accuracy).
  • Figure 3: Gemini 2.5 Pro accuracy when providing $5$ total solutions and varying the number of correct versus incorrect answers. The red line represents base accuracy of Pro on the question without any solutions. The numbers on top of each bar represent the probability of that configuration according to the average correctness. The orange bar highlights the most likely configuration. The green line uses these weights to give the accuracy when sampling $5$ solutions and using them as context for a final model call.

Theorems & Definitions (59)

  • Definition 2.0: Decaying Model
  • Definition 2.0: Uniform Model
  • Definition 2.0: Exponential Decay
  • Definition 2.0: Polynomial Decay
  • Theorem 3.2: Chernoff bound, see e.g. MitzenmacherUpfal05
  • Definition 3.3
  • Lemma 3.4
  • proof
  • Definition 4.1: Monotonicity
  • Lemma 4.2
  • ...and 49 more