HLStrans: Dataset for C-to-HLS Hardware Code Synthesis

Qingyun Zou; Nuo Chen; Yao Chen; Bingsheng He; WengFei Wong

HLStrans: Dataset for C-to-HLS Hardware Code Synthesis

Qingyun Zou, Nuo Chen, Yao Chen, Bingsheng He, WengFei Wong

TL;DR

<3-5 sentence high-level summary> This work introduces HLStrans, a large-scale benchmark dataset for C-to-HLS transformation containing paired C and HLS kernels with testbenches and synthesis annotations. It couples a novel augmentation pipeline that uses LLMs, Monte Carlo Tree Search, and Design Space Exploration to generate diverse, synthesizable variants and attach hardware-aware metrics. Through extensive experiments with multiple models and prompting methods, the study shows that retrieval and fine-tuning on HLStrans substantially improve synthesis success rates and latency reductions, underscoring its value as both a dataset and a benchmarking resource for LLM-guided hardware design. The dataset and training scripts are released to accelerate research at the intersection of AI and FPGA design across vendors and toolchains.

Abstract

High-Level Synthesis (HLS) enables hardware design from C/C++ kernels but requires extensive transformations, such as restructuring code, inserting pragmas, adapting data types, and repairing non-synthesizable constructs, to achieve efficient FPGA implementations. While large language models (LLMs) show promise in automating these transformations, progress has been limited by the absence of large-scale, well-structured datasets. Existing HLS datasets focus primarily on resource estimation, lack paired C and HLS examples with testbenches, and cover only a narrow set of optimizations. We introduce HLStrans, the first benchmark-scale dataset for LLM-driven C-to-HLS synthesis. HLStrans contains over 124K paired C and HLS programs for real-world applications, with full testbenches and synthesis-based annotations of latency and resource usage. The dataset systematically captures five categories of transformations and is enriched by an automated augmentation pipeline combining LLMs, Monte Carlo Tree Search (MCTS), and Design Space Exploration (DSE). We benchmark state-of-the-art LLMs on HLStrans, demonstrating that retrieval and fine-tuning significantly improve success rates and performance.

HLStrans: Dataset for C-to-HLS Hardware Code Synthesis

TL;DR

Abstract

HLStrans: Dataset for C-to-HLS Hardware Code Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)