ProofWala: Multilingual Proof Data Synthesis and Theorem-Proving
Amitayush Thakur, George Tsoukalas, Greg Durrett, Swarat Chaudhuri
TL;DR
ProofWala presents a multilingual, language-agnostic framework for data collection, model training, and parallel search in interactive theorem proving, uniting Lean and Coq under a standardized pipeline. By building multilingual proof-step data and fine-tuning transformer-based predictors, it demonstrates cross-language transfer and improved prove-at-$k$ performance versus monolingual baselines, with strong evidence after domain adaptation to CategoryTheory. The approach relies on a three-part architecture (interface, training, and parallel search) and leverages parallel Ray-based proof search to scale end-to-end synthesis. All code and multilingual datasets are open-sourced, offering a foundation for cross-domain theorem-proving research and future improvements in multilingual autoformalization and cross-ITP generalization.
Abstract
Neural networks have shown substantial promise at automatic theorem-proving in interactive proof assistants (ITPs) like Lean and Coq. However, most neural theorem-proving models are restricted to specific ITPs, leaving out opportunities for cross-lingual $\textit{transfer}$ between ITPs. We address this weakness with a multilingual proof framework, ${\rm P{\small ROOF}W{\small ALA}}$, that allows a standardized form of interaction between neural theorem-provers and two established ITPs (Coq and Lean). It enables the collection of multilingual proof step data -- data recording the result of proof actions on ITP states -- for training neural provers. ${\rm P{\small ROOF}W{\small ALA}}$ allows the systematic evaluation of a model's performance across different ITPs and problem domains via efficient parallel proof search algorithms. We show that multilingual training enabled by ${\rm P{\small ROOF}W{\small ALA}}$ can lead to successful transfer across ITPs. Specifically, a model trained on a mix of ${\rm P{\small ROOF}W{\small ALA}}$-generated Coq and Lean data outperforms Lean-only and Coq-only models on the standard prove-at-$k$ metric. We open source all code including code for the ${\rm P{\small ROOF}W{\small ALA}}$ Framework (https://github.com/trishullab/proof-wala), and the Multilingual ITP interaction framework (https://github.com/trishullab/itp-interface).
