Quokka: Accelerating Program Verification with LLMs via Invariant Synthesis
Anjiang Wei, Tarun Suresh, Tianran Sun, Haoze Wu, Ke Wang, Alex Aiken
TL;DR
The paper tackles the difficulty of automatic loop invariant discovery for formal program verification and investigates whether large language models can generate invariants that meaningfully accelerate verification. It introduces Quokka, a verifier-based invariant-synthesis framework that uses a simple, sound decision procedure and parallel verifier queries to evaluate LLM-generated invariants. On a large SV-COMP-derived benchmark of 866 instances, evaluated across nine LLMs, Quokka achieves state-of-the-art speedups, with at least 1.2x speedups on 81 instances and 2x on 51 instances, surpassing prior LLM-based verifiers. The work also demonstrates that supervised fine-tuning and Best-of-N sampling yield additional gains, and it provides a large synthetic dataset to support future research in LLM-assisted program verification.
Abstract
Program verification relies on loop invariants, yet automatically discovering strong invariants remains a long-standing challenge. We investigate whether large language models (LLMs) can accelerate program verification by generating useful loop invariants. We introduce Quokka, a first-order and effective framework for LLM-based invariant synthesis that provides sound evaluation while achieving state-of-the-art speedup results. Unlike prior work that designs complex, highly customized algorithms, Quokka employs a simple and principled verification procedure. We construct a benchmark of 866 instances and evaluate 9 state-of-the-art LLMs across multiple model families. Our results show that Quokka consistently outperforms all prior LLM-based verifiers: achieving speedups of at least 1.2x on 81 instances compared to 39 instances for the previous best approach. We further demonstrate that supervised fine-tuning and Best-of-N sampling can yield measurable improvements in accelerating verification.
