Table of Contents
Fetching ...

Reinforced Large Language Model is a formal theorem prover

Zhiling Luo

TL;DR

The paper addresses improving theorem proving with large language models by bridging the informal-to-formal gap using reinforcement learning. It introduces a two-phase framework that combines thought-augmented data preparation, supervised adaptation, and GRPO-based reinforcement learning to optimize next tactics within the Lean proof environment. Key contributions include a data-augmentation pipeline with ground-truth tactics, a two-stage training protocol yielding the Lean-Qwen0.5B-rl model, and a Lean-Dojo inference loop with queue-based tree search that shows empirical gains on miniF2F over direct fine-tuning. The work's significance lies in advancing automatic formalization and verification, enabling more reliable and scalable theorem proving with public-domain LLMs.

Abstract

To take advantage of Large Language Model in theorem formalization and proof, we propose a reinforcement learning framework to iteratively optimize the pretrained LLM by rolling out next tactics and comparing them with the expected ones. The experiment results show that it helps to achieve a higher accuracy compared with directly fine-tuned LLM.

Reinforced Large Language Model is a formal theorem prover

TL;DR

The paper addresses improving theorem proving with large language models by bridging the informal-to-formal gap using reinforcement learning. It introduces a two-phase framework that combines thought-augmented data preparation, supervised adaptation, and GRPO-based reinforcement learning to optimize next tactics within the Lean proof environment. Key contributions include a data-augmentation pipeline with ground-truth tactics, a two-stage training protocol yielding the Lean-Qwen0.5B-rl model, and a Lean-Dojo inference loop with queue-based tree search that shows empirical gains on miniF2F over direct fine-tuning. The work's significance lies in advancing automatic formalization and verification, enabling more reliable and scalable theorem proving with public-domain LLMs.

Abstract

To take advantage of Large Language Model in theorem formalization and proof, we propose a reinforcement learning framework to iteratively optimize the pretrained LLM by rolling out next tactics and comparing them with the expected ones. The experiment results show that it helps to achieve a higher accuracy compared with directly fine-tuned LLM.

Paper Structure

This paper contains 7 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The framework includes offline data preparation, model training and online inference.
  • Figure 2: Train process records of adaption phase
  • Figure 3: Train process records of RL phase