Table of Contents
Fetching ...

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

Manooshree Patel, Rayna Bhattacharyya, Thomas Lu, Arnav Mehta, Niels Voss, Narges Norouzi, Gireeja Ranade

TL;DR

This paper considers the development of an AI-based provably-correct mathematical proof tutor by combining the complementary strengths of LLMs and theorem provers, and presents a proof-of-concept system (LeanTutor), composed of three modules.

Abstract

This paper considers the development of an AI-based provably-correct mathematical proof tutor. While Large Language Models (LLMs) allow seamless communication in natural language, they are error prone. Theorem provers such as Lean allow for provable-correctness, but these are hard for students to learn. We present a proof-of-concept system (LeanTutor) by combining the complementary strengths of LLMs and theorem provers. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. To evaluate the system, we introduce PeanoBench, a dataset of 371 Peano Arithmetic proofs in human-written natural language and formal language, derived from the Natural Numbers Game.

LeanTutor: Towards a Verified AI Mathematical Proof Tutor

TL;DR

This paper considers the development of an AI-based provably-correct mathematical proof tutor by combining the complementary strengths of LLMs and theorem provers, and presents a proof-of-concept system (LeanTutor), composed of three modules.

Abstract

This paper considers the development of an AI-based provably-correct mathematical proof tutor. While Large Language Models (LLMs) allow seamless communication in natural language, they are error prone. Theorem provers such as Lean allow for provable-correctness, but these are hard for students to learn. We present a proof-of-concept system (LeanTutor) by combining the complementary strengths of LLMs and theorem provers. LeanTutor is composed of three modules: (i) an autoformalizer/proof-checker, (ii) a next-step generator, and (iii) a natural language feedback generator. To evaluate the system, we introduce PeanoBench, a dataset of 371 Peano Arithmetic proofs in human-written natural language and formal language, derived from the Natural Numbers Game.

Paper Structure

This paper contains 40 sections, 8 figures, 6 tables, 3 algorithms.

Figures (8)

  • Figure 1: LeanTutor is comprised of three modules: an autoformalizer that automatically formalizes an NL student proof into Lean step-by-step; a next step generator that generates a next feasible tactic for the student proof; and a natural language feedback generator that generates guiding feedback to help the student progress towards a correct proof.
  • Figure 2: Autoformalizer architecture. The natural langauge student step is provided to the autoformalizer, and the output is checked by the Lean compiler. The formalization of each step is appended to formalizations of previous steps to check for correctness.
  • Figure 3: Examples of annotated Peano Arithmetic proofs from PeanoBench for the theorem proving commutativity of addition, that is, for all $a, b \in \mathbb{N}$, $a + b = b + a$. The first proof,add_comm_staff_solution follows the exact Lean code from NNG4. The second and third proofs, add_comm_equation_based and add_comm_justification_based, are written in two different personas.
  • Figure 4: Example of an incorrect proof for the theorem proving commutativity of addition, that is, for all $a, b \in \mathbb{N}$, $a + b = b + a$. This proof, originally the justification-based persona, has the rw [hd] step, which applies the inductive hypothesis, skipped.
  • Figure 5: Architecture of the Next Step Generation module. An LLM generates tactic candidates which are appended to the pre-existing proof. Tactics that compile correctly are then passed through a "progress check" filter which ensures goal states are not being re-visited. This process of generating and checking tactics is repeated until the proof is completed.
  • ...and 3 more figures