Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

Ning Wang; Bingkun Yao; Jie Zhou; Xi Wang; Zhe Jiang; Nan Guan

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

Ning Wang, Bingkun Yao, Jie Zhou, Xi Wang, Zhe Jiang, Nan Guan

TL;DR

Verilog code generation with LLMs is hampered by data scarcity and Verilog's parallel structures. The authors introduce VeriSeek, which couples continual Verilog/C/C++ pre-training with code-structure-guided reinforcement learning using an AST-based similarity reward, supported by the VeriCores high-quality dataset. Empirical results on RTLLM2.0 and VerilogEval show VeriSeek achieving state-of-the-art performance among open-source models and surpassing GPT-4 on VerilogEval in functional measures, demonstrating the value of structure-aware RL for hardware description languages. This work highlights a practical path to robust Verilog generation with limited data, leveraging AST-based guidance to capture parallel design patterns common in HDL code.

Abstract

Recent advancements in large language models (LLMs) have sparked significant interest in the automatic generation of Register Transfer Level (RTL) designs, particularly using Verilog. Current research on this topic primarily focuses on pre-training and instruction tuning, but the effectiveness of these methods is constrained by the limited availability of training data, as public Verilog code is far less abundant than software code. In particular, these methods struggle to effectively capture Verilog parallel code structures, which fundamentally differ from the imperative, sequential control flow typical in most software programming languages. This paper introduces VeriSeek, an LLM enhanced by reinforcement learning using a limited amount of high-quality training data to achieve high Verilog code generation performance. Our reinforcement learning approach employs code structure information as feedback signals to refine the pre-trained model, enabling it to effectively learn important patterns from Verilog code with parallel structures. Experiments show that VeriSeek outperforms state-of-the-art methods across multiple benchmarks.

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

TL;DR

Abstract

Paper Structure (23 sections, 6 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 23 sections, 6 equations, 7 figures, 2 tables, 2 algorithms.

Introduction
Related Work
LLM for Verilog Code Generation
Post-training LLMs for Coding
VeriSeek
Continual Pre-training
Code-Structure-Guided Reinforcement Learning
Code-Structure-Guided Reward
Proximal Policy Optimization
VeriCores Dataset
Experiments and Performance Evaluation
Training Details
Metric and Benchmark
Metric
Benchmark
...and 8 more sections

Figures (7)

Figure 1: Two functionally equivalent Verilog modules with different token sequences. The left implementation follows ($parity \rightarrow flag \rightarrow data_{out} \rightarrow data_{reg}$) sequence, whereas the right one is ($data_{reg} \rightarrow flag \rightarrow parity \rightarrow data_{out}$). Corresponding colors between the left and right implementations represent identical code segments.
Figure 2: Overview of VeriSeek's training pipeline and reward mechanism. Starting from a base model $\pi_\phi$, the model is trained on Verilog and C/C++ code to get $\pi_{\psi}$. In the subsequent reinforcement learning stage, the model $\pi_\theta$ learns to generate Verilog code $\mathbf{\hat{y}}$ from natural language specifications $\mathbf{x}$ by optimizing a code-structure-guided reward function $r(\mathbf{y}, \mathbf{\hat{y}})$. This reward function evaluates the similarity between generated and reference code using AST-based similarity $sim_{\mathrm{AST}}$. For unparsable generations, negative rewards (-10 or -5) are assigned based on the severity of syntax violations, encouraging the model to maintain proper Verilog syntax and semantics.
Figure 3: Statistics of the VeriCores dataset, showing specification and code lengths, AST depth, node count, and branching factor metrics.
Figure 4: pass@$5$ performance comparison between VeriSeek and RTLCoder across different task categories in RTLLM2.0 benchmark.
Figure 5: Temperature analysis of VeriSeek$_{PTwC+RL}$.
...and 2 more figures

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

TL;DR

Abstract

Large Language Model for Verilog Generation with Code-Structure-Guided Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)