Table of Contents
Fetching ...

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

Jiahao Gai, Hao Mark Chen, Zhican Wang, Hongyu Zhou, Wanru Zhao, Nicholas Lane, Hongxiang Fan

TL;DR

This work investigates using code language models to automate hardware generation via High-Level Synthesis (HLS), addressing HDL data scarcity and high token costs by focusing on HLS-based design. It presents a large, open dataset of over $40{,}000$ HLS design entries and a two-stage framework that finetunes pre-trained LLMs on HLS data and then generates designs through iterative, feedback-driven processes that incorporate chain-of-thought prompting. Evaluation shows substantial gains from supervised finetuning and CoT prompting on syntax and functionality, with further improvements from two-step feedback loops, albeit with trade-offs in runtime and design complexity. The results suggest HLS-based generation is cost-effective and compatible with existing FPGA tooling, offering a practical pathway for scalable hardware automation while highlighting data diversity and prompt quality as critical future directions.

Abstract

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and enhancing programmer productivity. The potential of LLMs in software programming has sparked significant interest in exploring automated hardware generation and automation. Although preliminary endeavors have been made to adopt LLMs in generating hardware description languages (HDLs), several challenges persist in this direction. First, the volume of available HDL training data is substantially smaller compared to that for software programming languages. Second, the pre-trained LLMs, mainly tailored for software code, tend to produce HDL designs that are more error-prone. Third, the generation of HDL requires a significantly higher number of tokens compared to software programming, leading to inefficiencies in cost and energy consumption. To tackle these challenges, this paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design. Although code generation for domain-specific programming languages is not new in the literature, we aim to provide experimental results, insights, benchmarks, and evaluation infrastructure to investigate the suitability of HLS over low-level HDLs for LLM-assisted hardware design generation. To achieve this, we first finetune pre-trained models for HLS-based hardware generation, using a collected dataset with text prompts and corresponding reference HLS designs. An LLM-assisted framework is then proposed to automate end-to-end hardware code generation, which also investigates the impact of chain-of-thought and feedback loops promoting techniques on HLS-design generation. Limited by the timeframe of this research, we plan to evaluate more advanced reasoning models in the future.

Exploring Code Language Models for Automated HLS-based Hardware Generation: Benchmark, Infrastructure and Analysis

TL;DR

This work investigates using code language models to automate hardware generation via High-Level Synthesis (HLS), addressing HDL data scarcity and high token costs by focusing on HLS-based design. It presents a large, open dataset of over HLS design entries and a two-stage framework that finetunes pre-trained LLMs on HLS data and then generates designs through iterative, feedback-driven processes that incorporate chain-of-thought prompting. Evaluation shows substantial gains from supervised finetuning and CoT prompting on syntax and functionality, with further improvements from two-step feedback loops, albeit with trade-offs in runtime and design complexity. The results suggest HLS-based generation is cost-effective and compatible with existing FPGA tooling, offering a practical pathway for scalable hardware automation while highlighting data diversity and prompt quality as critical future directions.

Abstract

Recent advances in code generation have illuminated the potential of employing large language models (LLMs) for general-purpose programming languages such as Python and C++, opening new opportunities for automating software development and enhancing programmer productivity. The potential of LLMs in software programming has sparked significant interest in exploring automated hardware generation and automation. Although preliminary endeavors have been made to adopt LLMs in generating hardware description languages (HDLs), several challenges persist in this direction. First, the volume of available HDL training data is substantially smaller compared to that for software programming languages. Second, the pre-trained LLMs, mainly tailored for software code, tend to produce HDL designs that are more error-prone. Third, the generation of HDL requires a significantly higher number of tokens compared to software programming, leading to inefficiencies in cost and energy consumption. To tackle these challenges, this paper explores leveraging LLMs to generate High-Level Synthesis (HLS)-based hardware design. Although code generation for domain-specific programming languages is not new in the literature, we aim to provide experimental results, insights, benchmarks, and evaluation infrastructure to investigate the suitability of HLS over low-level HDLs for LLM-assisted hardware design generation. To achieve this, we first finetune pre-trained models for HLS-based hardware generation, using a collected dataset with text prompts and corresponding reference HLS designs. An LLM-assisted framework is then proposed to automate end-to-end hardware code generation, which also investigates the impact of chain-of-thought and feedback loops promoting techniques on HLS-design generation. Limited by the timeframe of this research, we plan to evaluate more advanced reasoning models in the future.

Paper Structure

This paper contains 23 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Comparison of data availability between HDLs and other software programming languages.
  • Figure 2: HLS-based and Verilog-based programs.
  • Figure 3: Template of design points.
  • Figure 4: An overview of our proposed framework.
  • Figure 5: Chain-of-thought prompts for HLS generation.
  • ...and 4 more figures