Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

Zoe Paraskevopoulou

Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

Zoe Paraskevopoulou

TL;DR

This paper investigates whether an agentic LLM can generate a substantial, machine-checked proof for a complex compiler transformation. Using Claude Code as a guiding tool, the authors mechanize the ANF correctness proof in Rocq, building on a known CPS-proof template and achieving roughly 7,800 lines of Rocq code in about 96 hours, with human reviewers providing high-level direction. The work demonstrates a template-based workflow for extending verified compiler pipelines, while candidly discussing limitations such as divergence-preservation issues, proof-statement weakening, and context-window challenges. The findings suggest that LLM-assisted proving, guided by structured templates and expert oversight, can accelerate verified compiler development, though robust tooling and reproducibility considerations remain critical for broader adoption.

Abstract

We report on using an agentic coding assistant (Claude Code, powered by Claude Opus 4.6) to mechanize a substantial Rocq correctness proof from scratch, with human guidance but without human proof writing. The proof establishes semantic preservation for the administrative normal form (ANF) transformation in the CertiCoq verified compiler for Rocq. The closely related continuation-passing style (CPS) transformation in CertiCoq was previously proved correct by human experts over several months. We use this proof as a template and instruct the LLM to adapt the proof technique to the ANF setting, which differs in important technical ways. The resulting ANF proof comprises approximately 7,800 lines of Rocq (larger than the 5,300-line CPS proof) and was developed in approximately 96 hours. We describe the proof technique and report on the experience of developing it with an LLM, discussing both the strengths and limitations of the approach and its implications for verified compiler construction.

Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

TL;DR

Abstract

Paper Structure (60 sections, 2 theorems, 12 equations, 3 figures, 4 tables)

This paper contains 60 sections, 2 theorems, 12 equations, 3 figures, 4 tables.

Introduction
The case study.
Contributions.
Paper overview.
The CPS Proof Technique
Technical Setup
Source Language
Target Language
1-hole context.
Logical Relation
The ANF Transformation and Its Correctness Proof
Proof Statements
Example: the Let case.
Top-level correctness.
Divergence Preservation
...and 45 more sections

Key Result

theorem 1

Let $\overline{v}$ be a well-formed environment, $e$ a well-formed source expression, and $\overline{v}; e \Downarrow_{\mathtt{src}} v$. Suppose $S; \overline{x}; e \vartriangleright C; x; S'$. Then for any target environment $\sigma$ such that $\texttt{ anf\_env\_rel}(\overline{x}, \overl where $e_k$ is any continuation expression such that ${\sf fv}(e_k) \cap ((S \setminus S') \setminu

Figures (3)

Figure 1: Language grammars.
Figure 2: Logical relation over $\lambda_{ \texttt{ ANF}}$ terms.
Figure 3: ANF transformation (relational definition, excerpt).

Theorems & Definitions (2)

theorem 1: ANF Correctness
corollary 1: Whole-Program ANF Correctness

Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

TL;DR

Abstract

Machine-Generated, Machine-Checked Proofs for a Verified Compiler (Experience Report)

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)