Reinforcement Learning of Theorem Proving
Cezary Kaliszyk, Josef Urban, Henryk Michalewski, Mirek Olšák
TL;DR
This work addresses automated theorem proving by replacing hand-crafted proof-search heuristics with reinforcement learning guided Monte-Carlo planning in a bare connection-based prover. By training on a large corpus and evaluating on unseen problems, the authors show that policy and value learning can substantially improve proof search efficiency, achieving over 40% more solved problems than a strong baseline within the same inference budget. The approach integrates Monte-Carlo Tree Search with learned priors and state evaluations, using engineered features and fast linear/boosting learners to maintain practical runtime. The Miz40 evaluation demonstrates strong generalization and highlights the potential of RL-driven proof search for mathematics and formal verification, with significant performance gains and new solvable theorems beyond traditional systems.
Abstract
We introduce a theorem proving algorithm that uses practically no domain heuristics for guiding its connection-style proof search. Instead, it runs many Monte-Carlo simulations guided by reinforcement learning from previous proof attempts. We produce several versions of the prover, parameterized by different learning and guiding algorithms. The strongest version of the system is trained on a large corpus of mathematical problems and evaluated on previously unseen problems. The trained system solves within the same number of inferences over 40% more problems than a baseline prover, which is an unusually high improvement in this hard AI domain. To our knowledge this is the first time reinforcement learning has been convincingly applied to solving general mathematical problems on a large scale.
