VeriGen: A Large Language Model for Verilog Code Generation
Shailja Thakur, Baleegh Ahmad, Hammond Pearce, Benjamin Tan, Brendan Dolan-Gavitt, Ramesh Karri, Siddharth Garg
TL;DR
VeriGen demonstrates that fine-tuning open-source LLMs on a large Verilog corpus enables competitive, hardware-focused code generation. By combining GitHub Verilog code and Verilog textbooks, the authors build a robust training and evaluation pipeline, using hand-crafted and HDLBits-based problem sets with comprehensive test benches. The results show that CodeGen-16B-FT delivers strong performance across problem difficulties, often surpassing pre-trained baselines and approaching or matching larger commercial models in several scenarios, while offering faster inference and open-access checkpoints. The work highlights the practical potential of smaller, in-house LLMs for HDL design automation, while acknowledging remaining challenges in achieving full functional correctness without human refinement and stressing the value of richer training data and prompt engineering.
Abstract
In this study, we explore the capability of Large Language Models (LLMs) to automate hardware design by generating high-quality Verilog code, a common language for designing and modeling digital systems. We fine-tune pre-existing LLMs on Verilog datasets compiled from GitHub and Verilog textbooks. We evaluate the functional correctness of the generated Verilog code using a specially designed test suite, featuring a custom problem set and testing benches. Here, our fine-tuned open-source CodeGen-16B model outperforms the commercial state-of-the-art GPT-3.5-turbo model with a 1.1% overall increase. Upon testing with a more diverse and complex problem set, we find that the fine-tuned model shows competitive performance against state-of-the-art gpt-3.5-turbo, excelling in certain scenarios. Notably, it demonstrates a 41% improvement in generating syntactically correct Verilog code across various problem categories compared to its pre-trained counterpart, highlighting the potential of smaller, in-house LLMs in hardware design automation.
