Table of Contents
Fetching ...

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Sho Sonoda, Shunta Akiyama, Yuya Uezato

TL;DR

The paper develops a theoretical framework for LLM-guided theorem proving as a finite-horizon deterministic MDP over compact metric spaces, with Lipschitz policy classes. It introduces two data-generation regimes: a cut-free baseline and a latent proof-DAG model that captures reusable cuts/lemmas, enabling analysis of hierarchical search via EM/VAE-style learning. Under Tsybakov-type margin assumptions and sequential generalization bounds, the authors derive lower bounds on finite-horizon success probability that separate search and imitation errors, and show an exponential separation: when cut-elimination inflates a DAG into a large cut-free tree, a cut-aware hierarchical learner requires exponentially less data to achieve a target success level. This provides a principled explanation for the effectiveness of subgoal decomposition in modern agentic theorem provers and highlights how structured latent representations can dramatically improve sample efficiency in rigorous automated reasoning.

Abstract

We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

TL;DR

The paper develops a theoretical framework for LLM-guided theorem proving as a finite-horizon deterministic MDP over compact metric spaces, with Lipschitz policy classes. It introduces two data-generation regimes: a cut-free baseline and a latent proof-DAG model that captures reusable cuts/lemmas, enabling analysis of hierarchical search via EM/VAE-style learning. Under Tsybakov-type margin assumptions and sequential generalization bounds, the authors derive lower bounds on finite-horizon success probability that separate search and imitation errors, and show an exponential separation: when cut-elimination inflates a DAG into a large cut-free tree, a cut-aware hierarchical learner requires exponentially less data to achieve a target success level. This provides a principled explanation for the effectiveness of subgoal decomposition in modern agentic theorem provers and highlights how structured latent representations can dramatically improve sample efficiency in rigorous automated reasoning.

Abstract

We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy , including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top- search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth into a cut-free tree of size while the cut-aware hierarchical process has size with , a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.
Paper Structure (120 sections, 32 theorems, 189 equations)

This paper contains 120 sections, 32 theorems, 189 equations.

Key Result

Theorem 1

Under ass:v3-A1ass:v3-A2ass:v3-A3ass:v3-A4ass:v3-A5, there exists $\tilde{C}=\tilde{C}(C,\beta)$ such that where $K_{\mathrm{flat}}$ is the expected number of decision points along a successful proof under $Q_{\mathrm{tree}}$, $m_k(S)$ is the generator top-$k$ mass at state $S$, and $p=\beta/(\beta+2)$. Moreover, with probability at least $1-\delta$, where $\varepsilon_{\mathrm{seq}}$ is control

Theorems & Definitions (73)

  • Theorem 1: informal; Statistical provability lowerbound of flat (cut-free) learning
  • Theorem 2: informal; Statistical provability lowerbound of Hierarchical (cut-aware) learning
  • Theorem 3: informal; Exponential separation (cut-aware vs cut-free on cut-structured problems)
  • proof : Proof of $Y$
  • proof : Proof of $X$ using $Y$
  • Theorem 4: Compactness
  • Lemma 1: Arzelà--Ascoli compactness of the BL unit ball
  • proof
  • Lemma 2: $d_{\mathrm{BL}}$ metrizes weak convergence on $M_{\le W}(K)$
  • proof
  • ...and 63 more