Table of Contents
Fetching ...

LeanAgent: Lifelong Learning for Formal Theorem Proving

Adarsh Kumarappan, Mo Tiwari, Peiyang Song, Robert Joseph George, Chaowei Xiao, Anima Anandkumar

TL;DR

LeanAgent addresses the challenge of generalizing formal theorem proving across expanding mathematical knowledge by coupling curriculum-driven data selection, a dynamic knowledge base, and progressive retriever training. It automatically mines Lean repositories, ranks theorems by difficulty using an exponential measure, and continually updates its premises and proofs through a best-first proving search. Across multiple repositories and domains, LeanAgent achieves strong stability and backward transfer while progressively proving many previously incomplete theorems, including research-level results, outperforming static baselines. The work demonstrates a viable path toward continuous generalizability and improvement in formal mathematics, with implications for automated theorem proving and collaboration with human mathematicians.

Abstract

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully generates formal proofs for 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.

LeanAgent: Lifelong Learning for Formal Theorem Proving

TL;DR

LeanAgent addresses the challenge of generalizing formal theorem proving across expanding mathematical knowledge by coupling curriculum-driven data selection, a dynamic knowledge base, and progressive retriever training. It automatically mines Lean repositories, ranks theorems by difficulty using an exponential measure, and continually updates its premises and proofs through a best-first proving search. Across multiple repositories and domains, LeanAgent achieves strong stability and backward transfer while progressively proving many previously incomplete theorems, including research-level results, outperforming static baselines. The work demonstrates a viable path toward continuous generalizability and improvement in formal mathematics, with implications for automated theorem proving and collaboration with human mathematicians.

Abstract

Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully generates formal proofs for 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.
Paper Structure (30 sections, 1 equation, 6 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: High-level overview of LeanAgent. LeanAgent automatically searches for and clones Lean repositories from GitHub. Then, it uses LeanDojo's yangLeanDojoTheoremProving2023 tracing functionality to extract fine-grained information about the theorems and proofs from these repositories. Then, it computes the difficulty of each theorem using the formula $\text{difficulty} = e^{(\text{number of proof steps})}$. Then, we compute the 33rd and 67th percentiles of difficulty across all theorems in all repositories. Using these percentiles, we sorted the repositories based on the number of easy theorems they contain, forming our curriculum. We add these repositories to LeanAgent's dynamic database to track its growing knowledge base. For each repository in the curriculum, LeanAgent uses the dynamic database to generate a dataset. After this, it progressively trains the LeanAgent's retriever on the merged dataset, focusing on limited exposure to account for the stability-plasticity tradeoff. After this, it uses premise retrieval, tactic generation, and best-first tree search to prove previously unproven results, known as sorry theorems, and adds generated proofs to the database.
  • Figure 2: LeanAgent Repositories
  • Figure 3: Overview of repository scanning and data extraction as well as dynamic database management in our lifelong learning framework
  • Figure 4: Overview of progressive training of the retriever in our lifelong learning framework
  • Figure 5: Overview of sorry theorem proving in our lifelong learning framework
  • ...and 1 more figures