Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback

Jason Blocklove; Shailja Thakur; Benjamin Tan; Hammond Pearce; Siddharth Garg; Ramesh Karri

Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback

Jason Blocklove, Shailja Thakur, Benjamin Tan, Hammond Pearce, Siddharth Garg, Ramesh Karri

TL;DR

This work investigates using automated EDA-tool feedback to repair LLM-generated Verilog, introducing AutoChip as an open-source framework that iteratively evaluates candidate designs via HDL compilers and testbenches and feeds error-driven feedback back to the LLM over a tree search with parameters $k$ and $d$. Evaluations on the VerilogEval benchmark show that tool feedback markedly improves results for GPT-4o, achieving up to a $5.8 ext{}$$ increase in passing designs and significant cost reductions; mixing smaller models with a final GPT-4o pass can reach similar success levels at substantially lower cost. The study highlights that feedback effectiveness is model-dependent, that increasing $k$ and $d$ generally helps, and that succinct feedback can rival full-context feedback while reducing token usage. The open-source AutoChip platform enables broader evaluation across more models and benchmarks, paving the way for automated, tool-guided hardware design workflows.

Abstract

Traditionally, digital hardware designs are written in the Verilog hardware description language (HDL) and debugged manually by engineers. This can be time-consuming and error-prone for complex designs. Large Language Models (LLMs) are emerging as a potential tool to help generate fully functioning HDL code, but most works have focused on generation in the single-shot capacity: i.e., run and evaluate, a process that does not leverage debugging and, as such, does not adequately reflect a realistic development process. In this work, we evaluate the ability of LLMs to leverage feedback from electronic design automation (EDA) tools to fix mistakes in their own generated Verilog. To accomplish this, we present an open-source, highly customizable framework, AutoChip, which combines conversational LLMs with the output from Verilog compilers and simulations to iteratively generate and repair Verilog. To determine the success of these LLMs we leverage the VerilogEval benchmark set. We evaluate four state-of-the-art conversational LLMs, focusing on readily accessible commercial models. EDA tool feedback proved to be consistently more effective than zero-shot prompting only with GPT-4o, the most computationally complex model we evaluated. In the best case, we observed a 5.8% increase in the number of successful designs with a 34.2% decrease in cost over the best zero-shot results. Mixing smaller models with this larger model at the end of the feedback iterations resulted in equally as much success as with GPT-4o using feedback, but incurred 41.9% lower cost (corresponding to an overall decrease in cost over zero-shot by 89.6%).

Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback

TL;DR

Abstract

Automatically Improving LLM-based Verilog Generation using EDA Tool Feedback

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)