MiniF2F in Rocq: Automatic Translation Between Proof Assistants -- A Case Study
Jules Viennot, Guillaume Baudart, Emilio Jesùs Gallego Arias, Marc Lelarge
TL;DR
This study investigates translating MiniF2F theorems from Lean and Isabelle to Rocq using state-of-the-art LLMs, addressing cross-system compatibility and benchmarking challenges. A three-stage cascade—one-shot prompting, multi-turn with error feedback, and refined prompting with extended attempts—paired with automatic Petanque verification and expert audits, achieves a translation rate of $\frac{478}{488}$ ($98\%$) with only $2\%$ errors. The results show early-stage capabilities exist, but iterative interaction and targeted prompting yield the largest gains, and human verification remains essential for faithful semantics. Overall, the work demonstrates the viability of LLM-driven cross-proof-language translation and offers a scalable path for FAIR cross-system evaluation and dataset portability.
Abstract
In this work, we conduct an experiment using state-of-the-art LLMs to translate MiniF2F into Rocq. The translation task focuses on generating a Rocq theorem based on three sources: a natural language description, the Lean formalization, and the Isabelle formalization. We conducted our experiment in 3 stages of increasing complexity, from basic one-shot prompting to multi-turn conversations that incorporate feedback from unsuccessful attempts. At each stage, we perform multiple rounds of translation using increasingly advanced models: GPT-4o mini, Claude 3.5 Sonnet, o1 mini, and o1. We successfully translated 478 out of 488 theorems. The dataset is opensource: https://github.com/LLM4Rocq/miniF2F-rocq.
