Table of Contents
Fetching ...

ExVerus: Verus Proof Repair via Counterexample Reasoning

Jun Yang, Yuechun Sun, Yi Wu, Rodrigo Caridad, Yongwei Yuan, Jianan Yao, Shan Lu, Kexin Pei

Abstract

Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.

ExVerus: Verus Proof Repair via Counterexample Reasoning

Abstract

Large Language Models (LLMs) have shown promising results in automating formal verification. However, existing approaches treat proof generation as a static, end-to-end prediction over source code, relying on limited verifier feedback and lacking access to concrete program behaviors. We present EXVERUS, a counterexample-guided framework that enables LLMs to reason about proofs using behavioral feedback via counterexamples. When a proof fails, EXVERUS automatically generates and validates counterexamples, and then guides the LLM to generalize them into inductive invariants to block these failures. Our evaluation shows that EXVERUS significantly improves proof accuracy, robustness, and token efficiency over the state-of-the-art prompting-based Verus proof generator.

Paper Structure

This paper contains 43 sections, 2 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Motivating example from VerusBench (Misc/findmax) showing advantages of source-level counterexample (the ExVerus counterexample generation on top right) v.s. Verus's counterexample (Verus counterexample generation on button right).
  • Figure 2: Workflow of ExVerus.
  • Figure 3: Repairing a wrong invariant that involves an invalid state by pinpointing and pruning it. Task Diffy/brs1 in VerusBench.
  • Figure 4: Identifying and removing a wrong invariant guided by counterexamples. Task CloverBench_two_sum_3 in ObfsBench.
  • Figure 5: Pseudo-code of . Algorithm \ref{['alg:pipeline']} illustrates the overall pipeline, Algorithm \ref{['alg:gen']} illustrates counterexample generation, and Algorithm \ref{['alg:genz']} illustrates mutation-based counterexample-guided repair.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Definition 2.1: Counterexample
  • Definition 2.2: Iterative Blocking