Table of Contents
Fetching ...

RAG-Verus: Repository-Level Program Verification with LLMs using Retrieval Augmented Generation

Sicheng Zhong, Jiading Zhu, Yifang Tian, Xujie Si

TL;DR

This work tackles the challenge of repository-level program verification by introducing RagVerus, a retrieval-augmented generation framework that leverages context-aware prompting to guide LLMs in synthesizing verifiable code across interdependent modules. It provides a configurable pipeline for mining repository properties, retrieving task-specific contexts, and generating verification proofs, supported by two retrieval strategies (few-shot examples and dependency-based premises). The authors introduce RepoVBench, the first repository-level Verus benchmark, derived from VeriSMo with 383 proof-completion tasks across 52 modules, and demonstrate that RagVerus achieves substantial gains on both function-level and repository-level benchmarks (e.g., 200+% improvement on VerusBench and 27% relative improvement on RepoVBench). The results indicate that context retrieval and cross-module reasoning significantly enhance proof synthesis, offering a scalable, sample-efficient path toward practical, large-scale formal verification.

Abstract

Scaling automated formal verification to real-world projects requires resolving cross-module dependencies and global contexts, which are challenges overlooked by existing function-centric methods. We introduce RagVerus, a framework that synergizes retrieval-augmented generation with context-aware prompting to automate proof synthesis for multi-module repositories, achieving a 27% relative improvement on our novel RepoVBench benchmark -- the first repository-level dataset for Verus with 383 proof completion tasks. RagVerus triples proof pass rates on existing benchmarks under constrained language model budgets, demonstrating a scalable and sample-efficient verification.

RAG-Verus: Repository-Level Program Verification with LLMs using Retrieval Augmented Generation

TL;DR

This work tackles the challenge of repository-level program verification by introducing RagVerus, a retrieval-augmented generation framework that leverages context-aware prompting to guide LLMs in synthesizing verifiable code across interdependent modules. It provides a configurable pipeline for mining repository properties, retrieving task-specific contexts, and generating verification proofs, supported by two retrieval strategies (few-shot examples and dependency-based premises). The authors introduce RepoVBench, the first repository-level Verus benchmark, derived from VeriSMo with 383 proof-completion tasks across 52 modules, and demonstrate that RagVerus achieves substantial gains on both function-level and repository-level benchmarks (e.g., 200+% improvement on VerusBench and 27% relative improvement on RepoVBench). The results indicate that context retrieval and cross-module reasoning significantly enhance proof synthesis, offering a scalable, sample-efficient path toward practical, large-scale formal verification.

Abstract

Scaling automated formal verification to real-world projects requires resolving cross-module dependencies and global contexts, which are challenges overlooked by existing function-centric methods. We introduce RagVerus, a framework that synergizes retrieval-augmented generation with context-aware prompting to automate proof synthesis for multi-module repositories, achieving a 27% relative improvement on our novel RepoVBench benchmark -- the first repository-level dataset for Verus with 383 proof completion tasks. RagVerus triples proof pass rates on existing benchmarks under constrained language model budgets, demonstrating a scalable and sample-efficient verification.

Paper Structure

This paper contains 19 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the RagVerus Framework Pipeline
  • Figure 2: Illustration of code informalization utility.
  • Figure 3: Examples of code masking and metadata extraction using Verus-adapted tools. (a) an example of an original function and its masked counterpart, where proof annotations are replaced with MASKED_LINE (highlighted in orange); (b) an example of the extracted metadata