Table of Contents
Fetching ...

AutoChip: Automating HDL Generation Using LLM Feedback

Shailja Thakur, Jason Blocklove, Hammond Pearce, Benjamin Tan, Siddharth Garg, Ramesh Karri

TL;DR

This work tackles the difficulty of HDL code generation by introducing AutoChip, a fully automated, feedback-driven flow that uses compiler and testbench outputs to iteratively refine Verilog designs generated by LLMs. The approach combines small and large LLM ensembles and supports two prompting modes to balance accuracy and cost, evaluated on 120 HDLBits problems. Key findings show that tool-informed feedback yields substantial improvements (up to 24.2% absolute gains, and up to 89.19% Pass@10 with an LLM ensemble), while succinct feedback generally outperforms full-context feedback in both effectiveness and efficiency. The authors also provide open-source code and a 120-prompt benchmark dataset, demonstrating a practical path toward automated hardware design assisted by AI, albeit with remaining challenges for certain problem classes and verification tasks.

Abstract

Traditionally, designs are written in Verilog hardware description language (HDL) and debugged by hardware engineers. While this approach is effective, it is time-consuming and error-prone for complex designs. Large language models (LLMs) are promising in automating HDL code generation. LLMs are trained on massive datasets of text and code, and they can learn to generate code that compiles and is functionally accurate. We aim to evaluate the ability of LLMs to generate functionally correct HDL models. We build AutoChip by combining the interactive capabilities of LLMs and the output from Verilog simulations to generate Verilog modules. We start with a design prompt for a module and the context from compilation errors and debugging messages, which highlight differences between the expected and actual outputs. This ensures that accurate Verilog code can be generated without human intervention. We evaluate AutoChip using problem sets from HDLBits. We conduct a comprehensive analysis of the AutoChip using several LLMs and problem categories. The results show that incorporating context from compiler tools, such as Icarus Verilog, improves the effectiveness, yielding 24.20% more accurate Verilog. We release our evaluation scripts and datasets as open-source contributions at the following link https://github.com/shailja-thakur/AutoChip.

AutoChip: Automating HDL Generation Using LLM Feedback

TL;DR

This work tackles the difficulty of HDL code generation by introducing AutoChip, a fully automated, feedback-driven flow that uses compiler and testbench outputs to iteratively refine Verilog designs generated by LLMs. The approach combines small and large LLM ensembles and supports two prompting modes to balance accuracy and cost, evaluated on 120 HDLBits problems. Key findings show that tool-informed feedback yields substantial improvements (up to 24.2% absolute gains, and up to 89.19% Pass@10 with an LLM ensemble), while succinct feedback generally outperforms full-context feedback in both effectiveness and efficiency. The authors also provide open-source code and a 120-prompt benchmark dataset, demonstrating a practical path toward automated hardware design assisted by AI, albeit with remaining challenges for certain problem classes and verification tasks.

Abstract

Traditionally, designs are written in Verilog hardware description language (HDL) and debugged by hardware engineers. While this approach is effective, it is time-consuming and error-prone for complex designs. Large language models (LLMs) are promising in automating HDL code generation. LLMs are trained on massive datasets of text and code, and they can learn to generate code that compiles and is functionally accurate. We aim to evaluate the ability of LLMs to generate functionally correct HDL models. We build AutoChip by combining the interactive capabilities of LLMs and the output from Verilog simulations to generate Verilog modules. We start with a design prompt for a module and the context from compilation errors and debugging messages, which highlight differences between the expected and actual outputs. This ensures that accurate Verilog code can be generated without human intervention. We evaluate AutoChip using problem sets from HDLBits. We conduct a comprehensive analysis of the AutoChip using several LLMs and problem categories. The results show that incorporating context from compiler tools, such as Icarus Verilog, improves the effectiveness, yielding 24.20% more accurate Verilog. We release our evaluation scripts and datasets as open-source contributions at the following link https://github.com/shailja-thakur/AutoChip.
Paper Structure (9 sections, 6 figures, 4 tables)

This paper contains 9 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: AutoChip HDL generator framework. Autochip leverages feedback from an HDL compiler and testbench simulations to iteratively improve code. An ensemble of a small (e.g. GPT-3.5) and big LLM (e.g. GPT-4) can be used to improve accuracy at low cost.
  • Figure 2: System prompt/context for LLM interactions
  • Figure 3: Testbench feedback in iteration 3 for vector concatenation problem, refer \ref{['fig:vector-iterations']}.
  • Figure 4: LLM: GPT-3.5-turbo, vector concat with feedback.
  • Figure 5: LLM: GPT-3.5-turbo, FSM serial rx (w) feedback.
  • ...and 1 more figures