PRO-V-R1: Reasoning Enhanced Programming Agent for RTL Verification
Yujie Zhao, Zhijing Wu, Boqin Yuan, Zhongming Yu, Hejia Zhang, Wentao Ni, Chia-Tung Ho, Haoxing Ren, Jishen Zhao
TL;DR
This paper introduces PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. It combines LLM-based reasoning with programmatic tools, a data-construction pipeline for expert trajectories, and reinforcement learning with verification-focused rewards to optimize end-to-end verification workflows. Empirical results show substantial gains in functional correctness and robust fault detection compared to both baselines and proprietary models, along with an 8.8x speedup per task, demonstrating practical viability and reproducibility for open RTL verification agents. The work provides a concrete blueprint for building domain-specific, open-source verification agents that reduce cost and privacy risks while improving verification quality.
Abstract
Register-Transfer Level (RTL) verification is a primary bottleneck, consuming 60-70% of development time. While Large Language Models (LLMs) show promise for RTL automation, their performance and research focus have overwhelmingly centered on RTL generation rather than verification. Current methods for RTL verification rely on large scale proprietary models (e.g., GPT-4o) to generate Python-based functional references, incurring a high cost and raising data-privacy risks. To date, an end-to-end open-source solution for autonomous verification remains absent. We introduce PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. Our contributions are threefold: (1) we design PRO-V sys, a modular agentic system that couples LLM-based reasoning with programmatic tool use for RTL verification; (2) we establish a data construction pipeline that leverages existing RTL datasets to build simulation-validated, expert-level trajectories tailored for supervised fine-tuning (SFT) RTL verification agents; and (3) we implement an efficient reinforcement learning (RL) algorithm that uses verification-specific rewards derived from program-tool feedback to optimize the end-to-end verification workflow. Our empirical evaluation demonstrates PRO-V-R1 achieves a 57.7% functional correctness rate and 34.0% in robust fault detection, significantly outperforming the base model's 25.7% and 21.8% (respectively) from the state-of-the-art (SOTA) automatic verification system. This configuration also outperforms large-scale proprietary LLMs in functional correctness and shows comparable robustness for fault detection.
