Table of Contents
Fetching ...

EDA-Aware RTL Generation with Large Language Models

Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace

TL;DR

This work tackles the challenge of zero-shot RTL generation by LLMs, where standard prompts yield syntax and functional errors that inflate verification work. It presents AIvril2, a self-verifying, LLM-agnostic, multi-agent framework with two iterative loops (syntax optimization and functional verification) and a testbench-first strategy, guided by EDA-output error logs. The architecture deploys three specialized agents—Code, Review, and Verification—to generate testbenches, fix syntax issues, and ensure functional correctness, respectively, in a language-agnostic fashion. Empirical results on the VerilogEval-Human suite show up to 3.4× improvement over prior methods, with best-case pass rates of 77% for Verilog and 66% for VHDL, demonstrating substantial reductions in manual verification and improved RTL reliability across languages.

Abstract

Large Language Models (LLMs) have become increasingly popular for generating RTL code. However, producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs, often leading to issues that require manual, iterative refinement. This additional debugging process can dramatically increase the verification workload, underscoring the need for robust, automated correction mechanisms to ensure code correctness from the start. In this work, we introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors. Our approach leverages a collaborative multi-agent system that incorporates feedback from error logs generated by EDA tools to automatically identify and resolve design flaws. Experimental results, conducted on the VerilogEval-Human benchmark suite, demonstrate that our framework significantly improves code quality, achieving nearly a 3.4$\times$ enhancement over prior methods. In the best-case scenario, functional pass rates of 77% for Verilog and 66% for VHDL were obtained, thus substantially improving the reliability of LLM-driven RTL code generation.

EDA-Aware RTL Generation with Large Language Models

TL;DR

This work tackles the challenge of zero-shot RTL generation by LLMs, where standard prompts yield syntax and functional errors that inflate verification work. It presents AIvril2, a self-verifying, LLM-agnostic, multi-agent framework with two iterative loops (syntax optimization and functional verification) and a testbench-first strategy, guided by EDA-output error logs. The architecture deploys three specialized agents—Code, Review, and Verification—to generate testbenches, fix syntax issues, and ensure functional correctness, respectively, in a language-agnostic fashion. Empirical results on the VerilogEval-Human suite show up to 3.4× improvement over prior methods, with best-case pass rates of 77% for Verilog and 66% for VHDL, demonstrating substantial reductions in manual verification and improved RTL reliability across languages.

Abstract

Large Language Models (LLMs) have become increasingly popular for generating RTL code. However, producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs, often leading to issues that require manual, iterative refinement. This additional debugging process can dramatically increase the verification workload, underscoring the need for robust, automated correction mechanisms to ensure code correctness from the start. In this work, we introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors. Our approach leverages a collaborative multi-agent system that incorporates feedback from error logs generated by EDA tools to automatically identify and resolve design flaws. Experimental results, conducted on the VerilogEval-Human benchmark suite, demonstrate that our framework significantly improves code quality, achieving nearly a 3.4 enhancement over prior methods. In the best-case scenario, functional pass rates of 77% for Verilog and 66% for VHDL were obtained, thus substantially improving the reliability of LLM-driven RTL code generation.

Paper Structure

This paper contains 13 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Architecture overview of the proposed AIvril2 framework.
  • Figure 2: Practical example of the proposed workflow and internal state representation of the agents in AIvril2.
  • Figure 3: Average latency breakdown across optimization loops for the proposed framework. Reported figures account for the execution times of EDA tools.