Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Sho Sonoda; Shunta Akiyama; Yuya Uezato

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Sho Sonoda, Shunta Akiyama, Yuya Uezato

TL;DR

The paper develops a theoretical framework for LLM-guided theorem proving as a finite-horizon deterministic MDP over compact metric spaces, with Lipschitz policy classes. It introduces two data-generation regimes: a cut-free baseline and a latent proof-DAG model that captures reusable cuts/lemmas, enabling analysis of hierarchical search via EM/VAE-style learning. Under Tsybakov-type margin assumptions and sequential generalization bounds, the authors derive lower bounds on finite-horizon success probability that separate search and imitation errors, and show an exponential separation: when cut-elimination inflates a DAG into a large cut-free tree, a cut-aware hierarchical learner requires exponentially less data to achieve a target success level. This provides a principled explanation for the effectiveness of subgoal decomposition in modern agentic theorem provers and highlights how structured latent representations can dramatically improve sample efficiency in rigorous automated reasoning.

Abstract

We develop a theoretical analysis of LLM-guided formal theorem proving in interactive proof assistants (e.g., Lean) by modeling tactic proposal as a stochastic policy in a finite-horizon deterministic MDP. To capture modern representation learning, we treat the state and action spaces as general compact metric spaces and assume Lipschitz policies. To explain the gap between worst-case hardness and empirical success, we introduce problem distributions generated by a reference policy $q$, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-$k$ search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth $D$ into a cut-free tree of size $Ω(Λ^D)$ while the cut-aware hierarchical process has size $O(λ^D)$ with $λ\llΛ$, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

TL;DR

Abstract

, including a latent-variable model in which proofs exhibit reusable cut/lemma/sketch structure represented by a proof DAG. Under a top-

search protocol and Tsybakov-type margin conditions, we derive lower bounds on finite-horizon success probability that decompose into search and learning terms, with learning controlled by sequential Rademacher/covering complexity. Our main separation result shows that when cut elimination expands a DAG of depth

into a cut-free tree of size

while the cut-aware hierarchical process has size

with

, a flat (cut-free) learner provably requires exponentially more data than a cut-aware hierarchical learner. This provides a principled justification for subgoal decomposition in recent agentic theorem provers.

Paper Structure (120 sections, 32 theorems, 189 equations)

This paper contains 120 sections, 32 theorems, 189 equations.

Introduction
Related work (cut/lemma sharing and modern agentic provers)
Related Work (hierarchical policies and exponential separations)
Hierarchical policies via options and goal-conditioned HRL.
Exponential gains from exploiting structure: factored MDPs.
Interactive theorem proving in Lean and an MDP formulation
Lean-style interaction: goals, tactics, and "no goals"
Proof trees and AND--OR structure.
Proof complexity.
Finite-horizon deterministic MDP for theorem proving
State and action spaces.
Deterministic kernel.
Policy class.
Finite horizon and success probability.
Optimal policies exist but are not informative.
...and 105 more sections

Key Result

Theorem 1

Under ass:v3-A1ass:v3-A2ass:v3-A3ass:v3-A4ass:v3-A5, there exists $\tilde{C}=\tilde{C}(C,\beta)$ such that where $K_{\mathrm{flat}}$ is the expected number of decision points along a successful proof under $Q_{\mathrm{tree}}$, $m_k(S)$ is the generator top-$k$ mass at state $S$, and $p=\beta/(\beta+2)$. Moreover, with probability at least $1-\delta$, where $\varepsilon_{\mathrm{seq}}$ is control

Theorems & Definitions (73)

Theorem 1: informal; Statistical provability lowerbound of flat (cut-free) learning
Theorem 2: informal; Statistical provability lowerbound of Hierarchical (cut-aware) learning
Theorem 3: informal; Exponential separation (cut-aware vs cut-free on cut-structured problems)
proof : Proof of $Y$
proof : Proof of $X$ using $Y$
Theorem 4: Compactness
Lemma 1: Arzelà--Ascoli compactness of the BL unit ball
proof
Lemma 2: $d_{\mathrm{BL}}$ metrizes weak convergence on $M_{\le W}(K)$
proof
...and 63 more

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

TL;DR

Abstract

Don't Eliminate Cut: Exponential Separations in LLM-Based Theorem Proving

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (73)