Table of Contents
Fetching ...

Making Written Theorems Explorable by Grounding Them in Formal Representations

Hita Kambhamettu, Will Crichton, Sean Welleck, Harrison Goldstein, Andrew Head

Abstract

LLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We instantiate this idea for mathematical proof comprehension with explorable theorems, a system that uses LLMs to translate a theorem and its written proof into Lean, a programming language for machine-checked proofs, and links the written proof with the Lean code. Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies bridging each step. Each worked-out step is produced by executing the Lean proof on that example and extracting its intermediate state. A user study ($n = 16$) shows potential advantages of this approach: in a proof-reading task, participants who had access to the provided explorability features gave better, more correct, and more detailed answers to comprehension questions, demonstrating a stronger overall understanding of the underlying mathematics.

Making Written Theorems Explorable by Grounding Them in Formal Representations

Abstract

LLM-generated explanations can make technical content more accessible, but there is a ceiling on what they can support interactively. Because LLM outputs are static text, they cannot be executed or stepped through. We argue that grounding explanations in a formalized representation enables interactive affordances beyond what static text supports. We instantiate this idea for mathematical proof comprehension with explorable theorems, a system that uses LLMs to translate a theorem and its written proof into Lean, a programming language for machine-checked proofs, and links the written proof with the Lean code. Readers can work through the proof at a step-level granularity, test custom examples or counterexamples, and trace the logical dependencies bridging each step. Each worked-out step is produced by executing the Lean proof on that example and extracting its intermediate state. A user study () shows potential advantages of this approach: in a proof-reading task, participants who had access to the provided explorability features gave better, more correct, and more detailed answers to comprehension questions, demonstrating a stronger overall understanding of the underlying mathematics.

Paper Structure

This paper contains 56 sections, 4 theorems, 1 equation, 6 figures.

Key Result

theorem 1

For all integers $x$, if $x > 2$, then $x^2 - 1$ is not prime. $\blacktriangleleft$$\blacktriangleleft$

Figures (6)

  • Figure 1: The explorable theorems interface for the theorem "For all integers $x$, if $x > 2$, then $x^2 - 1$ is not prime". The input slider 1 lets readers set a concrete value of $x$; color coding indicates where the theorem's assumption $x > 2$ holds. Each proof step is instantiated with Lean-verified intermediate values 2: for $x = 2$, the expression $n = x^2 - 1$ evaluates to $3$. Clicking a variable such as $n$ reveals the downstream proof steps that depend on it 3. When the concrete value violates the theorem's assumptions, the interface flags exactly where the proof breaks 4, making the role of each assumption legible to the reader.
  • Figure 2: The pipeline for grounding interaction affordances in a formal representation. First, the Lean proof is generated ( 1). The system executes this proof on concrete inputs and extract the Lean proof state ( 2). A language model then generates a template for the proof step worked out on an example ( 4). Lean-computed values are programmatically substituted into this template ( 6) to produce the verified worked example shown to the reader. The consecutive states of the proof are diffed to create a dependency graph ( 3). A mapping is created that links Lean statements to natural-language propositions ( 5).
  • Figure 3: Instructor preferences by condition. Box plot shows the distribution of TrueSkill ratings assigned to participant responses through pairwise comparisons by a mathematics instructor, combined across both study tasks. The instructor preferred responses written with explorable theorems in 94.9% of pairwise comparisons.
  • Figure 4: Response quality by condition. Box plots show the distribution of graded correctness scores and percentage of responses referencing explicit proof steps, combined across both theorem tasks. Vertical lines within each box indicate the median. Responses written after using explorable theorems were graded as significantly more correct and referenced specific proof steps more frequently.
  • Figure 5: Timeline of proof-reading behavior for each participant during session. The y-axis indicates which proof step (1–8) the participant is engaging with. Blue circles and orange squares denote reading in the abstract and concrete (instantiation) views, respectively. Purple diamonds mark dependency link clicks.
  • ...and 1 more figures

Theorems & Definitions (8)

  • theorem 1
  • proof
  • theorem 2
  • proof
  • theorem 3
  • proof
  • theorem 4
  • proof