RAG-Verus: Repository-Level Program Verification with LLMs using Retrieval Augmented Generation
Sicheng Zhong, Jiading Zhu, Yifang Tian, Xujie Si
TL;DR
This work tackles the challenge of repository-level program verification by introducing RagVerus, a retrieval-augmented generation framework that leverages context-aware prompting to guide LLMs in synthesizing verifiable code across interdependent modules. It provides a configurable pipeline for mining repository properties, retrieving task-specific contexts, and generating verification proofs, supported by two retrieval strategies (few-shot examples and dependency-based premises). The authors introduce RepoVBench, the first repository-level Verus benchmark, derived from VeriSMo with 383 proof-completion tasks across 52 modules, and demonstrate that RagVerus achieves substantial gains on both function-level and repository-level benchmarks (e.g., 200+% improvement on VerusBench and 27% relative improvement on RepoVBench). The results indicate that context retrieval and cross-module reasoning significantly enhance proof synthesis, offering a scalable, sample-efficient path toward practical, large-scale formal verification.
Abstract
Scaling automated formal verification to real-world projects requires resolving cross-module dependencies and global contexts, which are challenges overlooked by existing function-centric methods. We introduce RagVerus, a framework that synergizes retrieval-augmented generation with context-aware prompting to automate proof synthesis for multi-module repositories, achieving a 27% relative improvement on our novel RepoVBench benchmark -- the first repository-level dataset for Verus with 383 proof completion tasks. RagVerus triples proof pass rates on existing benchmarks under constrained language model budgets, demonstrating a scalable and sample-efficient verification.
