Table of Contents
Fetching ...

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

Sneha Swaroopa, Rijoy Mukherjee, Anushka Debnath, Rajat Subhra Chakraborty

TL;DR

The paper addresses the challenge of generating functionally correct RTL from natural language prompts by LLMs. It introduces a two-stage pipeline where an LLM first produces annotated C++ suitable for high-level synthesis (HLS), and then HLS translates the code to Verilog RTL, evaluated on the HLSEval benchmark. The approach yields strong functional-correctness gains, with $pass@1$ reaching up to $0.86$, outperforming direct NL-to-Verilog generation across multiple LLMs and showing the practical viability of combining LLMs with HLS tools for hardware design. This work advances automated hardware design workflows and provides open HLSEval benchmarks to foster reproducibility and further research.

Abstract

The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production using LLMs at current state-of-the-art lack sufficient functional correctness to be practically viable, compared to automatic generation of programs in general-purpose programming languages such as C, C++, Python, etc. With this as the key insight, in this paper we assess the performance of a two-stage software pipeline for automated Verilog RTL generation: LLM based automatic generation of annotated C++ code suitable for high-level synthesis (HLS), followed by HLS to generate Verilog RTL. We have benchmarked the performance of our proposed scheme using the open-source VerilogEval dataset, for four different industry-scale LLMs, and the Vitis HLS tool. Our experimental results demonstrate that our two-step technique substantially outperforms previous proposed techniques of direct Verilog RTL generation by LLMs in terms of average functional correctness rates, reaching score of 0.86 in pass@1 metric.

Evaluating Large Language Models for Automatic Register Transfer Logic Generation via High-Level Synthesis

TL;DR

The paper addresses the challenge of generating functionally correct RTL from natural language prompts by LLMs. It introduces a two-stage pipeline where an LLM first produces annotated C++ suitable for high-level synthesis (HLS), and then HLS translates the code to Verilog RTL, evaluated on the HLSEval benchmark. The approach yields strong functional-correctness gains, with reaching up to , outperforming direct NL-to-Verilog generation across multiple LLMs and showing the practical viability of combining LLMs with HLS tools for hardware design. This work advances automated hardware design workflows and provides open HLSEval benchmarks to foster reproducibility and further research.

Abstract

The ever-growing popularity of large language models (LLMs) has resulted in their increasing adoption for hardware design and verification. Prior research has attempted to assess the capability of LLMs to automate digital hardware design by producing superior-quality Register Transfer Logic (RTL) descriptions, particularly in Verilog. However, these tests have revealed that Verilog code production using LLMs at current state-of-the-art lack sufficient functional correctness to be practically viable, compared to automatic generation of programs in general-purpose programming languages such as C, C++, Python, etc. With this as the key insight, in this paper we assess the performance of a two-stage software pipeline for automated Verilog RTL generation: LLM based automatic generation of annotated C++ code suitable for high-level synthesis (HLS), followed by HLS to generate Verilog RTL. We have benchmarked the performance of our proposed scheme using the open-source VerilogEval dataset, for four different industry-scale LLMs, and the Vitis HLS tool. Our experimental results demonstrate that our two-step technique substantially outperforms previous proposed techniques of direct Verilog RTL generation by LLMs in terms of average functional correctness rates, reaching score of 0.86 in pass@1 metric.
Paper Structure (10 sections, 1 equation, 7 figures, 1 table)

This paper contains 10 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Comparison of existing RTL Generation via LLM with the proposed software pipeline with HLS.
  • Figure 2: Problem desciption of xnor in HLSEval.
  • Figure 3: gpt-3.5 turbo response for Finite State Machine (FSM) specifications.
  • Figure 4: gpt-3.5 turbo response for Karnaugh map (K-map) specification.
  • Figure 5: Example of prompting for fadd (full adder) in HLSEval. The logic description includes a description of the problem in natural language, a function description, and a sample one-shot input and output definition.
  • ...and 2 more figures