Table of Contents
Fetching ...

PRO-V-R1: Reasoning Enhanced Programming Agent for RTL Verification

Yujie Zhao, Zhijing Wu, Boqin Yuan, Zhongming Yu, Hejia Zhang, Wentao Ni, Chia-Tung Ho, Haoxing Ren, Jishen Zhao

TL;DR

This paper introduces PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. It combines LLM-based reasoning with programmatic tools, a data-construction pipeline for expert trajectories, and reinforcement learning with verification-focused rewards to optimize end-to-end verification workflows. Empirical results show substantial gains in functional correctness and robust fault detection compared to both baselines and proprietary models, along with an 8.8x speedup per task, demonstrating practical viability and reproducibility for open RTL verification agents. The work provides a concrete blueprint for building domain-specific, open-source verification agents that reduce cost and privacy risks while improving verification quality.

Abstract

Register-Transfer Level (RTL) verification is a primary bottleneck, consuming 60-70% of development time. While Large Language Models (LLMs) show promise for RTL automation, their performance and research focus have overwhelmingly centered on RTL generation rather than verification. Current methods for RTL verification rely on large scale proprietary models (e.g., GPT-4o) to generate Python-based functional references, incurring a high cost and raising data-privacy risks. To date, an end-to-end open-source solution for autonomous verification remains absent. We introduce PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. Our contributions are threefold: (1) we design PRO-V sys, a modular agentic system that couples LLM-based reasoning with programmatic tool use for RTL verification; (2) we establish a data construction pipeline that leverages existing RTL datasets to build simulation-validated, expert-level trajectories tailored for supervised fine-tuning (SFT) RTL verification agents; and (3) we implement an efficient reinforcement learning (RL) algorithm that uses verification-specific rewards derived from program-tool feedback to optimize the end-to-end verification workflow. Our empirical evaluation demonstrates PRO-V-R1 achieves a 57.7% functional correctness rate and 34.0% in robust fault detection, significantly outperforming the base model's 25.7% and 21.8% (respectively) from the state-of-the-art (SOTA) automatic verification system. This configuration also outperforms large-scale proprietary LLMs in functional correctness and shows comparable robustness for fault detection.

PRO-V-R1: Reasoning Enhanced Programming Agent for RTL Verification

TL;DR

This paper introduces PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. It combines LLM-based reasoning with programmatic tools, a data-construction pipeline for expert trajectories, and reinforcement learning with verification-focused rewards to optimize end-to-end verification workflows. Empirical results show substantial gains in functional correctness and robust fault detection compared to both baselines and proprietary models, along with an 8.8x speedup per task, demonstrating practical viability and reproducibility for open RTL verification agents. The work provides a concrete blueprint for building domain-specific, open-source verification agents that reduce cost and privacy risks while improving verification quality.

Abstract

Register-Transfer Level (RTL) verification is a primary bottleneck, consuming 60-70% of development time. While Large Language Models (LLMs) show promise for RTL automation, their performance and research focus have overwhelmingly centered on RTL generation rather than verification. Current methods for RTL verification rely on large scale proprietary models (e.g., GPT-4o) to generate Python-based functional references, incurring a high cost and raising data-privacy risks. To date, an end-to-end open-source solution for autonomous verification remains absent. We introduce PRO-V-R1, the first trainable open-source agentic framework for autonomous RTL verification. Our contributions are threefold: (1) we design PRO-V sys, a modular agentic system that couples LLM-based reasoning with programmatic tool use for RTL verification; (2) we establish a data construction pipeline that leverages existing RTL datasets to build simulation-validated, expert-level trajectories tailored for supervised fine-tuning (SFT) RTL verification agents; and (3) we implement an efficient reinforcement learning (RL) algorithm that uses verification-specific rewards derived from program-tool feedback to optimize the end-to-end verification workflow. Our empirical evaluation demonstrates PRO-V-R1 achieves a 57.7% functional correctness rate and 34.0% in robust fault detection, significantly outperforming the base model's 25.7% and 21.8% (respectively) from the state-of-the-art (SOTA) automatic verification system. This configuration also outperforms large-scale proprietary LLMs in functional correctness and shows comparable robustness for fault detection.

Paper Structure

This paper contains 12 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Performance comparison of different LLMs on RTL (green line) and testbench (blue line) generation. Models marked with * are RTL-specific. TB pass@1 denotes that the golden reference RTL code passes the LLM-generated testbench.
  • Figure 2: Semantic gap in indexing and slicing: Verilog's significance-based q[3] (MSB) contrasts with Python's offset-based q[0]. Also, Verilog's inclusive-range slice q[3:2] ('10') maps to Python's half-open range q[0:2].
  • Figure 3: Overview of the autonomous agent-based framework for RTL design verification. (a) The Agentic RTL Verification Pipeline, detailing the end-to-end process from high-level prompt and RTL specification to a final simulation report. (b) Agent and Tool Composition, defining the set of agentic roles and their corresponding callable tools. (c) Iterative Refinement with Majority Vote and Self-Verification, illustrating the sampling, selection, and self-correction loop used to enhance the quality of generated testbenches.
  • Figure 4: Overview of Agentic training framework.
  • Figure 5: Post-training case study (task 2013_q2afsm). (a) Input prompt. (b) Base LLM fails by treating input as a Python string. (c) Our model correctly generates Verilog bitwise operations.
  • ...and 1 more figures