Table of Contents
Fetching ...

ImProver: Agent-Based Automated Proof Optimization

Riyaz Ahuja, Jeremy Avigad, Prasad Tetali, Sean Welleck

TL;DR

ImProver introduces an agent-based framework for automated proof optimization in Lean, rewriting existing proofs to satisfy user-defined criteria such as shorter length or improved readability. It combines Chain-of-States prompting, error correction, retrieval-augmented generation, and multi-stage sampling (Best-of-n, Refinement, and their combinations) to improve proof quality while preserving correctness. The approach is validated across undergraduate MIL, competition (Compfiles), and research-level Mathlib proofs, showing substantial gains in length and readability compared to baseline GPT-4o prompts, with Ablation studies clarifying the impact of CoS, retrieval, and sampling configurations. The results suggest that, despite higher cost and slower runtime, ImProver significantly advances automated proof optimization and supports future improvements via fine-tuning and more capable models. The work highlights practical implications for producing more concise, declarative, and modular proofs that remain formally verified in Lean.

Abstract

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.

ImProver: Agent-Based Automated Proof Optimization

TL;DR

ImProver introduces an agent-based framework for automated proof optimization in Lean, rewriting existing proofs to satisfy user-defined criteria such as shorter length or improved readability. It combines Chain-of-States prompting, error correction, retrieval-augmented generation, and multi-stage sampling (Best-of-n, Refinement, and their combinations) to improve proof quality while preserving correctness. The approach is validated across undergraduate MIL, competition (Compfiles), and research-level Mathlib proofs, showing substantial gains in length and readability compared to baseline GPT-4o prompts, with Ablation studies clarifying the impact of CoS, retrieval, and sampling configurations. The results suggest that, despite higher cost and slower runtime, ImProver significantly advances automated proof optimization and supports future improvements via fine-tuning and more capable models. The work highlights practical implications for producing more concise, declarative, and modular proofs that remain formally verified in Lean.

Abstract

Large language models (LLMs) have been used to generate formal proofs of mathematical theorems in proofs assistants such as Lean. However, we often want to optimize a formal proof with respect to various criteria, depending on its downstream use. For example, we may want a proof to adhere to a certain style, or to be readable, concise, or modularly structured. Having suitably optimized proofs is also important for learning tasks, especially since human-written proofs may not optimal for that purpose. To this end, we study a new problem of automated proof optimization: rewriting a proof so that it is correct and optimizes for an arbitrary criterion, such as length or readability. As a first method for automated proof optimization, we present ImProver, a large-language-model agent that rewrites proofs to optimize arbitrary user-defined metrics in Lean. We find that naively applying LLMs to proof optimization falls short, and we incorporate various improvements into ImProver, such as the use of symbolic Lean context in a novel Chain-of-States technique, as well as error-correction and retrieval. We test ImProver on rewriting real-world undergraduate, competition, and research-level mathematics theorems, finding that ImProver is capable of rewriting proofs so that they are substantially shorter, more modular, and more readable.
Paper Structure (46 sections, 1 equation, 9 figures, 12 tables)

This paper contains 46 sections, 1 equation, 9 figures, 12 tables.

Figures (9)

  • Figure 1: $\mathrm{ImProver}$ automatically rewrites formal proofs to optimize a criterion such as length or readability while remaining correct. In this example, $\mathrm{ImProver}$ optimizes a human-written lemma (right) from the 2022 International Math Olympiad (Question 2, solution from Compfiles compfiles) for length. $\mathrm{ImProver}$'s optimized proof is correct and more concise.
  • Figure 2: A Lean proof (left) with Chain-of-States prompting annotations (right).
  • Figure 3: Optimizing a group-theoretic result from MIL Chapter 8 Section 1 for readability. We define a proof to be readable if it is written in a declarative style, which consists of intermediate conjectures (have$\cdots$ statements). $\mathrm{ImProver}$ introduces two intermediate conjectures into the proof.
  • Figure 4: Solving a group theorem exercise from MIL Chapter 8 Section 1 for readability.
  • Figure 5: Optimizing a lemma from IMO 2019 P1 for readability
  • ...and 4 more figures