Table of Contents
Fetching ...

VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification

Patrick Yubeaton, Andre Nakkab, Weihua Xiao, Luca Collini, Ramesh Karri, Chinmay Hegde, Siddharth Garg

TL;DR

VeriThoughts tackles HDL data scarcity by introducing a large-scale, reasoning-enabled Verilog dataset with prompts, reasoning traces, and formal-verification labels. The authors build a four-way pipeline connecting ground-truth $V$, prompt $Q$, reasoning $R$, and generated $V^{*}$, with a self-consistency label $L_c$ obtained via Yosys-based equivalence checks. They demonstrate that reasoning traces and self-consistency can improve Verilog generation, achieving state-of-the-art results on VerilogEval with open-source models fine-tuned on VeriThoughts. This work illustrates the benefits of combining prompting, reasoning, and formal verification to produce verifiably correct HDL code and sets a path for expanding HDL data-generation tools.

Abstract

This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the growing need for automated hardware design tools that can produce verifiably correct implementations from high-level specifications, potentially accelerating the hardware development process while maintaining rigorous correctness guarantees. Our code and data are available at \href{https://github.com/wilyub/VeriThoughts}{this URL}.

VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification

TL;DR

VeriThoughts tackles HDL data scarcity by introducing a large-scale, reasoning-enabled Verilog dataset with prompts, reasoning traces, and formal-verification labels. The authors build a four-way pipeline connecting ground-truth , prompt , reasoning , and generated , with a self-consistency label obtained via Yosys-based equivalence checks. They demonstrate that reasoning traces and self-consistency can improve Verilog generation, achieving state-of-the-art results on VerilogEval with open-source models fine-tuned on VeriThoughts. This work illustrates the benefits of combining prompting, reasoning, and formal verification to produce verifiably correct HDL code and sets a path for expanding HDL data-generation tools.

Abstract

This paper introduces VeriThoughts, a novel dataset designed for reasoning-based Verilog code generation. We establish a new benchmark framework grounded in formal verification methods to evaluate the quality and correctness of generated hardware descriptions. Additionally, we present a suite of specialized small-scale models optimized specifically for Verilog generation. Our work addresses the growing need for automated hardware design tools that can produce verifiably correct implementations from high-level specifications, potentially accelerating the hardware development process while maintaining rigorous correctness guarantees. Our code and data are available at \href{https://github.com/wilyub/VeriThoughts}{this URL}.

Paper Structure

This paper contains 21 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Generation of the VeriThoughts dataset involves four steps. Starting with a repository of ground-truth Verilog $V$, we ask a frontier model to pose a question $Q$ for which $V$ is a valid response. Then we pose $Q$ to a second frontier reasoning model, obtaining response $R$ and generated Verilog $V^{*}$. Finally, a formal equivalence checker $E$ returns a self-consistency label $L_{C} \leftarrow E(V=V^{*})$. A data point in the VeriThoughts data sample is a tuple $\{V, Q, R, V^{*}, L_{c} \}$. VeriThoughts questions are used for supervised fine-tuning of SoTA LLMs with reasoning and Verilog code as targets.
  • Figure 2: Prompt used to generate a question $Q$ consistent with ground-truth Verilog ($V$).
  • Figure 3: These figures explore various statistics when comparing the self-consistent and inconsistent subset of VeriThoughts. The left figure compares the number of lines of code in ground truth Verilog samples. The middle figure compares the number of characters in a reasoning trace. The right figure looks at the number of sequential modules present in each dataset.
  • Figure 4: Example of prompt error case.
  • Figure 5: Example of code-gen error case.
  • ...and 4 more figures