Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)
Zoe Paraskevopoulou
TL;DR
This paper investigates whether an agentic LLM can generate a substantial, machine-checked proof for a complex compiler transformation. Using Claude Code as a guiding tool, the authors mechanize the ANF correctness proof in Rocq, building on a known CPS-proof template and achieving roughly 7,800 lines of Rocq code in about 96 hours, with human reviewers providing high-level direction. The work demonstrates a template-based workflow for extending verified compiler pipelines, while candidly discussing limitations such as divergence-preservation issues, proof-statement weakening, and context-window challenges. The findings suggest that LLM-assisted proving, guided by structured templates and expert oversight, can accelerate verified compiler development, though robust tooling and reproducibility considerations remain critical for broader adoption.
Abstract
We report on using an agentic coding assistant (Claude Code, powered by Claude Opus 4.6) to mechanize a substantial Rocq correctness proof from scratch, with human guidance but without human proof writing. The proof establishes semantic preservation for the administrative normal form (ANF) transformation in the CertiCoq verified compiler for Rocq. The closely related continuation-passing style (CPS) transformation in CertiCoq was previously proved correct by human experts over several months. We use this proof as a template and instruct the LLM to adapt the proof technique to the ANF setting, which differs in important technical ways. The resulting ANF proof comprises approximately 7,800 lines of Rocq (larger than the 5,300-line CPS proof) and was developed in approximately 96 hours. We describe the proof technique and report on the experience of developing it with an LLM, discussing both the strengths and limitations of the approach and its implications for verified compiler construction.
