Table of Contents
Fetching ...

PEFA-AI: Advancing Open-source LLMs for RTL generation using Progressive Error Feedback Agentic-AI

Athma Narayanan, Mahesh Subedar, Omesh Tickoo

TL;DR

The paper tackles autonomous RTL code generation by introducing PEFA-AI, a Progressive Error Feedback Agentic-AI framework that orchestrates multiple specialized LLMs and hardware simulators to iteratively self-correct RTL designs. It leverages a four-loop progressive feedback mechanism to validate compilation and functional correctness, while enabling synthesizable results and IP protection through black-box test benches. Benchmarking on VerilogEval and RTLLM1.1 across open- and closed-source models demonstrates state-of-the-art pass rates and improved token efficiency, significantly reducing total LLM calls compared to non-agentic baselines. The approach offers a scalable, modular path toward privacy-preserving, autonomous hardware design augmentation and highlights future work in broader RTL tasks and PPA-aware optimization integration.

Abstract

We present an agentic flow consisting of multiple agents that combine specialized LLMs and hardware simulation tools to collaboratively complete the complex task of Register Transfer Level (RTL) generation without human intervention. A key feature of the proposed flow is the progressive error feedback system of agents (PEFA), a self-correcting mechanism that leverages iterative error feedback to progressively increase the complexity of the approach. The generated RTL includes checks for compilation, functional correctness, and synthesizable constructs. To validate this adaptive approach to code generation, benchmarking is performed using two opensource natural language-to-RTL datasets. We demonstrate the benefits of the proposed approach implemented on an open source agentic framework, using both open- and closed-source LLMs, effectively bridging the performance gap between them. Compared to previously published methods, our approach sets a new benchmark, providing state-of-the-art pass rates while being efficient in token counts.

PEFA-AI: Advancing Open-source LLMs for RTL generation using Progressive Error Feedback Agentic-AI

TL;DR

The paper tackles autonomous RTL code generation by introducing PEFA-AI, a Progressive Error Feedback Agentic-AI framework that orchestrates multiple specialized LLMs and hardware simulators to iteratively self-correct RTL designs. It leverages a four-loop progressive feedback mechanism to validate compilation and functional correctness, while enabling synthesizable results and IP protection through black-box test benches. Benchmarking on VerilogEval and RTLLM1.1 across open- and closed-source models demonstrates state-of-the-art pass rates and improved token efficiency, significantly reducing total LLM calls compared to non-agentic baselines. The approach offers a scalable, modular path toward privacy-preserving, autonomous hardware design augmentation and highlights future work in broader RTL tasks and PPA-aware optimization integration.

Abstract

We present an agentic flow consisting of multiple agents that combine specialized LLMs and hardware simulation tools to collaboratively complete the complex task of Register Transfer Level (RTL) generation without human intervention. A key feature of the proposed flow is the progressive error feedback system of agents (PEFA), a self-correcting mechanism that leverages iterative error feedback to progressively increase the complexity of the approach. The generated RTL includes checks for compilation, functional correctness, and synthesizable constructs. To validate this adaptive approach to code generation, benchmarking is performed using two opensource natural language-to-RTL datasets. We demonstrate the benefits of the proposed approach implemented on an open source agentic framework, using both open- and closed-source LLMs, effectively bridging the performance gap between them. Compared to previously published methods, our approach sets a new benchmark, providing state-of-the-art pass rates while being efficient in token counts.

Paper Structure

This paper contains 20 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Overall Agentic workflow that takes an input prompt and test bench to generate the correct RTL code over $N$ agentic feedback loops.
  • Figure 2: Illustrated here are the prompt and template setups. A context or system message is provided to the LLM, directing it to produce Verilog code, which is subsequently executed by the code executor agent.
  • Figure 3: State flow diagram illustrating the layout of the log summarizer. The system allows a maximum of four feedback loops. Once the compilation succeeds, the counter $N$ starts. The feedback process begins by sending a basic error message to the LLM, specifically addressing mismatches in the VCD dump. If $N > 1$, an additional summarized feedback message is generated by the smaller LLM.
  • Figure 4: The incorrect code generated by the code_creator is executed by the code_executor agent. The test bench is modified using a template to generate all input and output values. As illustrated: (1) The output trace consists of hundreds of lines, requiring summarization. (2) The log_summarizer agent condenses the log into a few lines to prevent token explosion and reduce hallucination, aiding the LLM. This summary is fed back to the LLM in an iterative loop until successful compilation and test pass, as shown in (3). Subsequently, when the test passes successfully as in (4) with no mismatches, the run is completed.
  • Figure 5: Bar plot showcasing the effect of progressive feedback on pass rate (%). In the simple feedback setting the agent is only exposed to the surrounding mismatches from the log_summarizer four times.
  • ...and 5 more figures