Table of Contents
Fetching ...

Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis

Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, Alex Aiken

TL;DR

The paper tackles the difficulty of automatic loop invariant discovery for formal program verification and investigates whether large language models can generate invariants that meaningfully accelerate verification. It introduces Quokka, a verifier-based invariant-synthesis framework that uses a simple, sound decision procedure and parallel verifier queries to evaluate LLM-generated invariants. On a large SV-COMP-derived benchmark of 866 instances, evaluated across nine LLMs, Quokka achieves state-of-the-art speedups, with at least 1.2x speedups on 81 instances and 2x on 51 instances, surpassing prior LLM-based verifiers. The work also demonstrates that supervised fine-tuning and Best-of-N sampling yield additional gains, and it provides a large synthetic dataset to support future research in LLM-assisted program verification.

Abstract

Program verification relies on loop invariants, yet automatically discovering strong invariants remains a long-standing challenge. We investigate whether large language models (LLMs) can accelerate program verification by generating useful loop invariants. We introduce Quokka, a first-order and effective framework for LLM-based invariant synthesis that provides sound evaluation while achieving state-of-the-art speedup results. Unlike prior work that designs complex, highly customized algorithms, Quokka employs a simple and principled verification procedure. We construct a benchmark of 866 instances and evaluate 9 state-of-the-art LLMs across multiple model families. Our results show that Quokka consistently outperforms all prior LLM-based verifiers: achieving speedups of at least 1.2x on 81 instances compared to 39 instances for the previous best approach. We further demonstrate that supervised fine-tuning and Best-of-N sampling can yield measurable improvements in accelerating verification.

Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis

TL;DR

The paper tackles the difficulty of automatic loop invariant discovery for formal program verification and investigates whether large language models can generate invariants that meaningfully accelerate verification. It introduces Quokka, a verifier-based invariant-synthesis framework that uses a simple, sound decision procedure and parallel verifier queries to evaluate LLM-generated invariants. On a large SV-COMP-derived benchmark of 866 instances, evaluated across nine LLMs, Quokka achieves state-of-the-art speedups, with at least 1.2x speedups on 81 instances and 2x on 51 instances, surpassing prior LLM-based verifiers. The work also demonstrates that supervised fine-tuning and Best-of-N sampling yield additional gains, and it provides a large synthetic dataset to support future research in LLM-assisted program verification.

Abstract

Program verification relies on loop invariants, yet automatically discovering strong invariants remains a long-standing challenge. We investigate whether large language models (LLMs) can accelerate program verification by generating useful loop invariants. We introduce Quokka, a first-order and effective framework for LLM-based invariant synthesis that provides sound evaluation while achieving state-of-the-art speedup results. Unlike prior work that designs complex, highly customized algorithms, Quokka employs a simple and principled verification procedure. We construct a benchmark of 866 instances and evaluate 9 state-of-the-art LLMs across multiple model families. Our results show that Quokka consistently outperforms all prior LLM-based verifiers: achieving speedups of at least 1.2x on 81 instances compared to 39 instances for the previous best approach. We further demonstrate that supervised fine-tuning and Best-of-N sampling can yield measurable improvements in accelerating verification.

Paper Structure

This paper contains 43 sections, 1 theorem, 12 equations, 4 figures, 10 tables.

Key Result

Theorem 1

If $P \Rightarrow \langle p^\star,q\rangle \Downarrow \mathbf{T}$ is derivable, then $P \models p^\star$. If $P \Rightarrow \langle p^\star,q\rangle \Downarrow \mathbf{F}$ is derivable, then $P \not\models p^\star$.

Figures (4)

  • Figure 1: Illustration of Quokka's evaluation pipeline. The LLM proposes an invariant by specifying a program location and predicate (e.g., location B with x % 7 == 3). The verification procedure then incorporates this invariant to prove the property x != 700 using two verifier queries, and we measure the resulting speedup relative to a baseline without LLM assistance.
  • Figure 2: Number of instances solved by different methods over varying timeout budgets.
  • Figure 3: Effect of Best-of-N sampling on the number of instances under different Fast$_{p}$ settings.
  • Figure A1: An example from the fine-tuning dataset: program and its loop invariant generated by UAutomizer.

Theorems & Definitions (2)

  • Theorem : Decision Soundness
  • proof