Table of Contents
Fetching ...

HLSPilot: LLM-based High-Level Synthesis

Chenwei Xiong, Cheng Liu, Huawei Li, Xiaowei Li

TL;DR

HLSPilot addresses the bottleneck of hardware accelerator development by leveraging LLMs to translate sequential C/C++ into optimized HLS code, enabling automated high-level synthesis on hybrid CPU-FPGA platforms. The framework integrates runtime profiling to locate bottlenecks, program-tree-based task pipelining for fine-grained decomposition, LLM-guided HLS optimization drawn from vendor documentation, and DSE for automatic pragma tuning, forming an end-to-end hardware acceleration workflow. Empirical results show HLSPilot can match or exceed manually optimized designs and delivers substantial speedups, including an 11.93x end-to-end improvement on L-BFGS, demonstrating the practical potential of LLM-assisted hardware design. This approach promises to reduce development effort and accelerate heterogeneous computing by closing the semantic gap between software descriptions and hardware implementations.

Abstract

Large language models (LLMs) have catalyzed an upsurge in automatic code generation, garnering significant attention for register transfer level (RTL) code generation. Despite the potential of RTL code generation with natural language, it remains error-prone and limited to relatively small modules because of the substantial semantic gap between natural language expressions and hardware design intent. In response to the limitations, we propose a methodology that reduces the semantic gaps by utilizing C/C++ for generating hardware designs via High-Level Synthesis (HLS) tools. Basically, we build a set of C-to-HLS optimization strategies catering to various code patterns, such as nested loops and local arrays. Then, we apply these strategies to sequential C/C++ code through in-context learning, which provides the LLMs with exemplary C/C++ to HLS prompts. With this approach, HLS designs can be generated effectively. Since LLMs still face problems in determining the optimized pragma parameters precisely, we have a design space exploration (DSE) tool integrated for pragma parameter tuning. Furthermore, we also employ profiling tools to pinpoint the performance bottlenecks within a program and selectively convert bottleneck components to HLS code for hardware acceleration. By combining the LLM-based profiling, C/C++ to HLS translation, and DSE, we have established HLSPilot, the first LLM-enabled high-level synthesis framework, which can fully automate the high-level application acceleration on hybrid CPU-FPGA architectures. According to our experiments on real-world application benchmarks, HLSPilot achieve comparable performance in general and can even outperform manually crafted counterparts, thereby underscoring the substantial promise of LLM-assisted hardware designs.

HLSPilot: LLM-based High-Level Synthesis

TL;DR

HLSPilot addresses the bottleneck of hardware accelerator development by leveraging LLMs to translate sequential C/C++ into optimized HLS code, enabling automated high-level synthesis on hybrid CPU-FPGA platforms. The framework integrates runtime profiling to locate bottlenecks, program-tree-based task pipelining for fine-grained decomposition, LLM-guided HLS optimization drawn from vendor documentation, and DSE for automatic pragma tuning, forming an end-to-end hardware acceleration workflow. Empirical results show HLSPilot can match or exceed manually optimized designs and delivers substantial speedups, including an 11.93x end-to-end improvement on L-BFGS, demonstrating the practical potential of LLM-assisted hardware design. This approach promises to reduce development effort and accelerate heterogeneous computing by closing the semantic gap between software descriptions and hardware implementations.

Abstract

Large language models (LLMs) have catalyzed an upsurge in automatic code generation, garnering significant attention for register transfer level (RTL) code generation. Despite the potential of RTL code generation with natural language, it remains error-prone and limited to relatively small modules because of the substantial semantic gap between natural language expressions and hardware design intent. In response to the limitations, we propose a methodology that reduces the semantic gaps by utilizing C/C++ for generating hardware designs via High-Level Synthesis (HLS) tools. Basically, we build a set of C-to-HLS optimization strategies catering to various code patterns, such as nested loops and local arrays. Then, we apply these strategies to sequential C/C++ code through in-context learning, which provides the LLMs with exemplary C/C++ to HLS prompts. With this approach, HLS designs can be generated effectively. Since LLMs still face problems in determining the optimized pragma parameters precisely, we have a design space exploration (DSE) tool integrated for pragma parameter tuning. Furthermore, we also employ profiling tools to pinpoint the performance bottlenecks within a program and selectively convert bottleneck components to HLS code for hardware acceleration. By combining the LLM-based profiling, C/C++ to HLS translation, and DSE, we have established HLSPilot, the first LLM-enabled high-level synthesis framework, which can fully automate the high-level application acceleration on hybrid CPU-FPGA architectures. According to our experiments on real-world application benchmarks, HLSPilot achieve comparable performance in general and can even outperform manually crafted counterparts, thereby underscoring the substantial promise of LLM-assisted hardware designs.
Paper Structure (18 sections, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: HLSPilot framework
  • Figure 2: An example of program tree construction. LLM divides BFS with nested loop into multiple dependent tasks for the pipelined execution.
  • Figure 3: Automatic Optimization Strategies Learning and Application
  • Figure 4: Structured information extracted by HLSPilot. The optimization strategy from documents is summarized into four parts: (1) strategy overview and (2) applicable scenarios for strategy retrieval; (3) parameter description and (4) examples for generating optimization prompt