Reinforced Large Language Model is a formal theorem prover
Zhiling Luo
TL;DR
The paper addresses improving theorem proving with large language models by bridging the informal-to-formal gap using reinforcement learning. It introduces a two-phase framework that combines thought-augmented data preparation, supervised adaptation, and GRPO-based reinforcement learning to optimize next tactics within the Lean proof environment. Key contributions include a data-augmentation pipeline with ground-truth tactics, a two-stage training protocol yielding the Lean-Qwen0.5B-rl model, and a Lean-Dojo inference loop with queue-based tree search that shows empirical gains on miniF2F over direct fine-tuning. The work's significance lies in advancing automatic formalization and verification, enabling more reliable and scalable theorem proving with public-domain LLMs.
Abstract
To take advantage of Large Language Model in theorem formalization and proof, we propose a reinforcement learning framework to iteratively optimize the pretrained LLM by rolling out next tactics and comparing them with the expected ones. The experiment results show that it helps to achieve a higher accuracy compared with directly fine-tuned LLM.
