Table of Contents
Fetching ...

AIvril: AI-Driven RTL Generation With Verification In-The-Loop

Mubashir ul Islam, Humza Sami, Pierre-Emmanuel Gaillardon, Valerio Tenace

TL;DR

AIvril tackles the reliability gap in LLM-driven RTL design by embedding automatic syntax correction and functional verification into a multi-agent GenEDA framework. It couples AutoReview and AutoDV to iteratively refine RTL code and testbenches, achieving up to $2\times$ code-quality improvements and an average verification success rate of $88.46\%$ on VerilogEval-Human, across multiple LLMs without fine-tuning. The framework is tool- and model-agnostic, enabling seamless integration with diverse EDA tools and LLMs while providing transparent verification feedback. Together, these advances offer a practical, dependable path toward automated RTL generation and robust hardware design workflows.

Abstract

Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks. Leveraging these capabilities, LLMs hold the potential to transform the entire hardware design stack, with predictions suggesting that front-end and back-end tasks could be fully automated in the near future. Currently, LLMs show great promise in streamlining Register Transfer Level (RTL) generation, enhancing efficiency, and accelerating innovation. However, their probabilistic nature makes them prone to inaccuracies - a significant drawback in RTL design, where reliability and precision are essential. To address these challenges, this paper introduces AIvril, an advanced framework designed to enhance the accuracy and reliability of RTL-aware LLMs. AIvril employs a multi-agent, LLM-agnostic system for automatic syntax correction and functional verification, significantly reducing - and in many cases, completely eliminating - instances of erroneous code generation. Experimental results conducted on the VerilogEval-Human dataset show that our framework improves code quality by nearly 2x when compared to previous works, while achieving an 88.46% success rate in meeting verification objectives. This represents a critical step toward automating and optimizing hardware design workflows, offering a more dependable methodology for AI-driven RTL design.

AIvril: AI-Driven RTL Generation With Verification In-The-Loop

TL;DR

AIvril tackles the reliability gap in LLM-driven RTL design by embedding automatic syntax correction and functional verification into a multi-agent GenEDA framework. It couples AutoReview and AutoDV to iteratively refine RTL code and testbenches, achieving up to code-quality improvements and an average verification success rate of on VerilogEval-Human, across multiple LLMs without fine-tuning. The framework is tool- and model-agnostic, enabling seamless integration with diverse EDA tools and LLMs while providing transparent verification feedback. Together, these advances offer a practical, dependable path toward automated RTL generation and robust hardware design workflows.

Abstract

Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks. Leveraging these capabilities, LLMs hold the potential to transform the entire hardware design stack, with predictions suggesting that front-end and back-end tasks could be fully automated in the near future. Currently, LLMs show great promise in streamlining Register Transfer Level (RTL) generation, enhancing efficiency, and accelerating innovation. However, their probabilistic nature makes them prone to inaccuracies - a significant drawback in RTL design, where reliability and precision are essential. To address these challenges, this paper introduces AIvril, an advanced framework designed to enhance the accuracy and reliability of RTL-aware LLMs. AIvril employs a multi-agent, LLM-agnostic system for automatic syntax correction and functional verification, significantly reducing - and in many cases, completely eliminating - instances of erroneous code generation. Experimental results conducted on the VerilogEval-Human dataset show that our framework improves code quality by nearly 2x when compared to previous works, while achieving an 88.46% success rate in meeting verification objectives. This represents a critical step toward automating and optimizing hardware design workflows, offering a more dependable methodology for AI-driven RTL design.
Paper Structure (13 sections, 3 figures)

This paper contains 13 sections, 3 figures.

Figures (3)

  • Figure 1: Overall architecture of the proposed AIvril framework.
  • Figure 2: Syntax and functional pass rates for AutoReview. Total number of syntax errors across the benchmark suite (a), and obtained pass@1 scores (b).
  • Figure 3: Functional pass rates and verification success rate for AutoDV.