Table of Contents
Fetching ...

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

Sho Sonoda, Shunta Akiyama, Yuya Uezato

TL;DR

This work treats modern agentic theorem provers as time-bounded decision processes that must reach a verified proof within a compute budget, reframing provability as a finite-horizon reachability problem. By leveraging Bellman structure, it derives provability certificates and analyzes how components such as verifier feedback, library retrieval, test-time search, and representation geometry influence success under biased problem distributions. The authors establish existence and structure of optimal value functions, bound deviations of score-guided planning via approximation and margin parameters, and propose a certificate-gap objective to guide representation learning. Together, the framework clarifies when biased, real-world distributions yield tractable provability while delineating limitations in worst-case or adversarial regimes, and it provides practical guidelines for improving agentic provers through better embeddings, tighter certificates, and more informative verifier feedback.

Abstract

Agentic theorem provers -- pipelines that couple a mathematical reasoning model with library retrieval, subgoal-decomposition/search planner, and a proof assistant verifier -- have recently achieved striking empirical success, yet it remains unclear which components drive performance and why such systems work at all despite classical hardness of proof search. We propose a distributional viewpoint and introduce **statistical provability**, defined as the finite-horizon success probability of reaching a verified proof, averaged over an instance distribution, and formalize modern theorem-proving pipelines as time-bounded MDPs. Exploiting Bellman structure, we prove existence of optimal policies under mild regularity, derive provability certificates via sub-/super-solution inequalities, and bound the performance gap of score-guided planning (greedy/top-\(k\)/beam/rollouts) in terms of approximation error, sequential statistical complexity, representation geometry (metric entropy/doubling structure), and action-gap margin tails. Together, our theory provides a principled, component-sensitive explanation of when and why agentic theorem provers succeed on biased real-world problem distributions, while clarifying limitations in worst-case or adversarial regimes.

Why Agentic Theorem Prover Works: A Statistical Provability Theory of Mathematical Reasoning Models

TL;DR

This work treats modern agentic theorem provers as time-bounded decision processes that must reach a verified proof within a compute budget, reframing provability as a finite-horizon reachability problem. By leveraging Bellman structure, it derives provability certificates and analyzes how components such as verifier feedback, library retrieval, test-time search, and representation geometry influence success under biased problem distributions. The authors establish existence and structure of optimal value functions, bound deviations of score-guided planning via approximation and margin parameters, and propose a certificate-gap objective to guide representation learning. Together, the framework clarifies when biased, real-world distributions yield tractable provability while delineating limitations in worst-case or adversarial regimes, and it provides practical guidelines for improving agentic provers through better embeddings, tighter certificates, and more informative verifier feedback.

Abstract

Agentic theorem provers -- pipelines that couple a mathematical reasoning model with library retrieval, subgoal-decomposition/search planner, and a proof assistant verifier -- have recently achieved striking empirical success, yet it remains unclear which components drive performance and why such systems work at all despite classical hardness of proof search. We propose a distributional viewpoint and introduce **statistical provability**, defined as the finite-horizon success probability of reaching a verified proof, averaged over an instance distribution, and formalize modern theorem-proving pipelines as time-bounded MDPs. Exploiting Bellman structure, we prove existence of optimal policies under mild regularity, derive provability certificates via sub-/super-solution inequalities, and bound the performance gap of score-guided planning (greedy/top-/beam/rollouts) in terms of approximation error, sequential statistical complexity, representation geometry (metric entropy/doubling structure), and action-gap margin tails. Together, our theory provides a principled, component-sensitive explanation of when and why agentic theorem provers succeed on biased real-world problem distributions, while clarifying limitations in worst-case or adversarial regimes.
Paper Structure (55 sections, 19 theorems, 63 equations)

This paper contains 55 sections, 19 theorems, 63 equations.

Key Result

Theorem 1

If $(K,d_K)$ is compact, then $(M_{\le W}(K),d_{\mathrm{BL}})$ is a compact metric space.

Theorems & Definitions (45)

  • Theorem 1: Compactness
  • Definition 1: Time-bounded success / provability
  • Definition 2: Finite-horizon value functions
  • Definition 3: One-step optimal action-value
  • Lemma 1: Bounds and monotonicity
  • proof
  • Theorem 2: Existence of maximizers and optimal Markov policies
  • Definition 4: Bellman operator
  • Lemma 2: Monotonicity
  • proof
  • ...and 35 more