Table of Contents
Fetching ...

Laurel: Unblocking Automated Verification with Large Language Models

Eric Mugnier, Emmanuel Anaya Gonzalez, Ranjit Jhala, Nadia Polikarpova, Yuanyuan Zhou

TL;DR

Laurel tackles the bottleneck of verification in large Dafny projects by automatically generating missing assertion hints using large language models. The method couples error-message–guided localization of the insertion point with in-context examples drawn from structurally similar proofs, selected by a novel hierarchical sequence similarity metric. On the DafnyGym dataset from three real-world codebases, Laurel achieves 56.6% success in generating necessary assertions within ten attempts, improving verification efficiency by up to a factor of 6–to–1 over naive prompts and leveraging context-specific demonstrations for additional gains. The work provides a practical, model-agnostic approach to unblocking automated verification and offers data and artifacts to extend evaluation to other SMT-based verifiers.

Abstract

Program verifiers such as Dafny automate proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires hints in the form of assertions, creating a burden for the proof engineer. In this paper, we propose Laurel, a tool that alleviates this burden by automatically generating assertions using large language models (LLMs). To improve the success rate of LLMs in this task, we design two domain-specific prompting techniques. First, we help the LLM determine the location of the missing assertion by analyzing the verifier's error message and inserting an assertion placeholder at that location. Second, we provide the LLM with example assertions from the same codebase, which we select based on a new proof similarity metric. We evaluate our techniques on our new benchmark DafnyGym, a dataset of complex lemmas we extracted from three real-world Dafny codebases. Our evaluation shows that Laurel is able to generate over 56.6\% of the required assertions given only a few attempts, making LLMs an affordable tool for unblocking program verifiers without human intervention.

Laurel: Unblocking Automated Verification with Large Language Models

TL;DR

Laurel tackles the bottleneck of verification in large Dafny projects by automatically generating missing assertion hints using large language models. The method couples error-message–guided localization of the insertion point with in-context examples drawn from structurally similar proofs, selected by a novel hierarchical sequence similarity metric. On the DafnyGym dataset from three real-world codebases, Laurel achieves 56.6% success in generating necessary assertions within ten attempts, improving verification efficiency by up to a factor of 6–to–1 over naive prompts and leveraging context-specific demonstrations for additional gains. The work provides a practical, model-agnostic approach to unblocking automated verification and offers data and artifacts to extend evaluation to other SMT-based verifiers.

Abstract

Program verifiers such as Dafny automate proofs by outsourcing them to an SMT solver. This automation is not perfect, however, and the solver often requires hints in the form of assertions, creating a burden for the proof engineer. In this paper, we propose Laurel, a tool that alleviates this burden by automatically generating assertions using large language models (LLMs). To improve the success rate of LLMs in this task, we design two domain-specific prompting techniques. First, we help the LLM determine the location of the missing assertion by analyzing the verifier's error message and inserting an assertion placeholder at that location. Second, we provide the LLM with example assertions from the same codebase, which we select based on a new proof similarity metric. We evaluate our techniques on our new benchmark DafnyGym, a dataset of complex lemmas we extracted from three real-world Dafny codebases. Our evaluation shows that Laurel is able to generate over 56.6\% of the required assertions given only a few attempts, making LLMs an affordable tool for unblocking program verifiers without human intervention.
Paper Structure (49 sections, 1 equation, 16 figures, 1 table, 1 algorithm)

This paper contains 49 sections, 1 equation, 16 figures, 1 table, 1 algorithm.

Figures (16)

  • Figure 1: Example of a Dafny lemma that fails to verify without an assertion and the corresponding error message.
  • Figure 2: Overview of Laurel: it takes two inputs, the lemma to fix, and the codebase used to find in-context examples. It identifies the assertion location in the lemma’s branch using the error message and selects relevant examples with a similar recursive structure. Finally, it prompts the LLM to generate an assertion.
  • Figure 3: A failing Dafny lemma with complex control flow.
  • Figure 4: Control flow graph (CFG) of the lemma in \ref{['fig:union-size']}. The highlighted corresponds to the first violation from the error message. Orange circles denote candidate placeholder locations, were the faulty branch absent from the error message.
  • Figure 5: Similarity metric examples
  • ...and 11 more figures