Proof Strategy Extraction from LLMs for Enhancing Symbolic Provers
Jian Fang, Yican Sun, Yingfei Xiong
TL;DR
The paper proposes Strat2Rocq, a framework that extracts internal proving strategies from LLMs by generating natural-language proofs, summarizing them into formal lemmas in Rocq, and integrating them into CoqHammer to enhance symbolic proving. It demonstrates that adding these LLM-derived lemmas increases the average proof success rate by $13.41\%$ and automated tactics by $26.27\%$ on open Rocq projects, while highlighting challenges such as NL-to-lemma conversion failures and lemma redundancy. The approach offers a secure, offline complement to LLM-driven proof agents, enabling cost-effective and confidential verification workflows without relying on external LLM services during proving. It also shows cross-LLM applicability and potential gains from combining lemmas from multiple LLM backends, suggesting a promising direction for integrating learned reasoning with symbolic reasoning in software verification.
Abstract
One important approach to software verification is interactive theorem proving. However, writing formal proofs often requires substantial human effort, making proof automation highly important. Traditionally, proof automation has relied on symbolic provers. Recently, large language models (LLMs) have demonstrated strong capabilities in theorem proving, complementing symbolic provers. Nonetheless, prompting LLMs can be expensive and may pose security risks for confidential codebases. As a result, purely symbolic approaches remain important even in the LLM era, as they are cost-effective, secure, and complement the strengths of LLMs. Motivated by these considerations, we ask a new research question: can we extract the internal strategies of LLMs to enhance the capabilities of symbolic provers? As an initial attempt to answer this question, we propose Strat2Rocq, which extracts proof strategies from LLMs and formalizes them as lemmas in Rocq. These lemmas are accessible to symbolic provers such as CoqHammer. With the addition of these LLM-extracted lemmas, CoqHammer is able to prove more theorems. The knowledge extraction process involves analyzing the proof trajectories of LLMs on a training set of proved theorems. For each theorem, we prompt the LLM to generate a natural language proof, then ask it to summarize this proof into formalized lemmas with proofs. We also employ a standard agentic approach to mitigate errors during formalization. Our evaluation demonstrates that, on open-source Rocq projects for software verification, Strat2Rocq enhances the success rate of CoqHammer by 13.41%.
