Table of Contents
Fetching ...

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao

TL;DR

HLSFactory addresses the lack of scalable, standardized, and extensible HLS datasets by introducing a three-stage framework that expands design spaces, synthesizes across multiple vendor tool flows, and aggregates data into ML-ready formats. The OptDSL frontend enables vendor-agnostic design-space specification, while concrete design generation unlocks vendor-specific synthesis, and random sampling ensures broad coverage including suboptimal designs for robust ML training. The paper demonstrates through seven case studies that the framework improves ML QoR prediction, expands design-space coverage, accelerates dataset build via fine-grained parallelism, and supports multi-vendor and data-integration workflows. By providing open-source tooling, built-in benchmarks, and reproducible artifact workflows, HLSFactory enables community contributions and scalable ML research in EDA.

Abstract

Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through seven case studies: I) ML model for QoR prediction; II) Design space sampling; III) Fine-grained parallelism backend speedup; IV) Targeting Intel's HLS flow; V) Adding new auxiliary designs; VI) Integrating published HLS data; VII) HLS tool version regression benchmarking.

HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and Beyond

TL;DR

HLSFactory addresses the lack of scalable, standardized, and extensible HLS datasets by introducing a three-stage framework that expands design spaces, synthesizes across multiple vendor tool flows, and aggregates data into ML-ready formats. The OptDSL frontend enables vendor-agnostic design-space specification, while concrete design generation unlocks vendor-specific synthesis, and random sampling ensures broad coverage including suboptimal designs for robust ML training. The paper demonstrates through seven case studies that the framework improves ML QoR prediction, expands design-space coverage, accelerates dataset build via fine-grained parallelism, and supports multi-vendor and data-integration workflows. By providing open-source tooling, built-in benchmarks, and reproducible artifact workflows, HLSFactory enables community contributions and scalable ML research in EDA.

Abstract

Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets. In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs. We showcase the versatility and multi-functionality of our framework through seven case studies: I) ML model for QoR prediction; II) Design space sampling; III) Fine-grained parallelism backend speedup; IV) Targeting Intel's HLS flow; V) Adding new auxiliary designs; VI) Integrating published HLS data; VII) HLS tool version regression benchmarking.
Paper Structure (42 sections, 11 figures, 2 tables)

This paper contains 42 sections, 11 figures, 2 tables.

Figures (11)

  • Figure 1: A complete overview of the HLSFactory framework with three stages and three entry points where users can contribute their own designs.
  • Figure 2: A snippet demonstrating the OptDSL syntax.
  • Figure 3: Example usage of the HLSFactory framework.
  • Figure 4: The directory structure that HLSFactory uses. Red are input files; green are the intermediate design points; blue are output files.
  • Figure 5: True-vs-predicted plots for the HLS-based ML QoR model. Test values are shown for models trained on the complete and partial subset of the training design space. "RAE": Relative Absolute Error ($|\hat{y} - y| / |y - \bar{y}|$), "R2": Coefficient of Determination
  • ...and 6 more figures