Table of Contents
Fetching ...

AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

Mingzhe Gao, Jieru Zhao, Zhe Lin, Wenchao Ding, Xiaofeng Hou, Yu Feng, Chao Li, Minyi Guo

TL;DR

AutoVCoder addresses the challenge of generating correct Verilog RTL code with LLMs by combining a high-quality hardware dataset, a two-stage fine-tuning regime, and domain-specific retrieval-augmented generation. The framework delivers measurable gains on RTL benchmarks, outperforming both industrial and academic baselines, including sub-16B models and even surpassing ChatGPT-4 on certain tasks. Key contributions include automated RTL data curation with a lightweight code scorer, a two-round fine-tuning pipeline using LoRA and instruction tuning, and a dual retriever RAG system trained via contrastive learning to supply example demonstrations and domain knowledge. The results indicate substantial improvements in both syntax and functional correctness, highlighting AutoVCoder's potential to scale accurate hardware design generation from natural-language prompts.

Abstract

Recently, the use of large language models (LLMs) for software code generation, e.g., C/C++ and Python, has proven a great success. However, LLMs still suffer from low syntactic and functional correctness when it comes to the generation of register-transfer level (RTL) code, such as Verilog. To address this issue, in this paper, we develop AutoVCoder, a systematic open-source framework that significantly improves the LLMs' correctness of generating Verilog code and enhances the quality of its output at the same time. Our framework integrates three novel techniques, including a high-quality hardware dataset generation approach, a two-round LLM fine-tuning method and a domain-specific retrieval-augmented generation (RAG) mechanism. Experimental results demonstrate that AutoVCoder outperforms both industrial and academic LLMs in Verilog code generation. Specifically, AutoVCoder shows a 0.5% and 2.2% improvement in functional correctness on the EvalMachine and EvalHuman benchmarks compared with BetterV, and also achieves a 3.4% increase in syntax correctness and a 3.4% increase in functional correctness on the RTLLM benchmark compared with RTLCoder.

AutoVCoder: A Systematic Framework for Automated Verilog Code Generation using LLMs

TL;DR

AutoVCoder addresses the challenge of generating correct Verilog RTL code with LLMs by combining a high-quality hardware dataset, a two-stage fine-tuning regime, and domain-specific retrieval-augmented generation. The framework delivers measurable gains on RTL benchmarks, outperforming both industrial and academic baselines, including sub-16B models and even surpassing ChatGPT-4 on certain tasks. Key contributions include automated RTL data curation with a lightweight code scorer, a two-round fine-tuning pipeline using LoRA and instruction tuning, and a dual retriever RAG system trained via contrastive learning to supply example demonstrations and domain knowledge. The results indicate substantial improvements in both syntax and functional correctness, highlighting AutoVCoder's potential to scale accurate hardware design generation from natural-language prompts.

Abstract

Recently, the use of large language models (LLMs) for software code generation, e.g., C/C++ and Python, has proven a great success. However, LLMs still suffer from low syntactic and functional correctness when it comes to the generation of register-transfer level (RTL) code, such as Verilog. To address this issue, in this paper, we develop AutoVCoder, a systematic open-source framework that significantly improves the LLMs' correctness of generating Verilog code and enhances the quality of its output at the same time. Our framework integrates three novel techniques, including a high-quality hardware dataset generation approach, a two-round LLM fine-tuning method and a domain-specific retrieval-augmented generation (RAG) mechanism. Experimental results demonstrate that AutoVCoder outperforms both industrial and academic LLMs in Verilog code generation. Specifically, AutoVCoder shows a 0.5% and 2.2% improvement in functional correctness on the EvalMachine and EvalHuman benchmarks compared with BetterV, and also achieves a 3.4% increase in syntax correctness and a 3.4% increase in functional correctness on the RTLLM benchmark compared with RTLCoder.
Paper Structure (14 sections, 3 equations, 7 figures, 3 tables)

This paper contains 14 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: An example of the RAG process.
  • Figure 2: Framework overview of AutoVCoder.
  • Figure 3: Prompt for marking input code with a score.
  • Figure 4: Code scoring mechanism with ChatGPT-3.5.
  • Figure 5: Prompt for generating problem-code pairs.
  • ...and 2 more figures