Table of Contents
Fetching ...

Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning

Yuchao Liao, Tosiron Adegbija, Roman Lysecky, Ravi Tandon

TL;DR

The paper tackles the scarcity and fidelity of system-level HLS data for design space exploration by introducing Vaegan, a three-stage framework that converts diverse HLS data into fixed-point inputs, uses a VAE (MLPVAE) or DCGAN to generate synthetic data, and rigorously evaluates fidelity against real data with metrics such as $MMD$, $SSD$, $PRD$, and $COSS$. Across two FPGA parts and with/without HLS directives, MLPVAE consistently outperforms prior synthetic baselines and DCGAN in fidelity, while also offering faster training; a wearable-case study demonstrates Vaegan-generated data yielding Pareto frontiers closer to the original system than competing synthetic methods. Vaegan thus enables scalable, high-quality synthetic data for complex, multi-component HLS DSE, expanding the feasible design space for real-world embedded systems. The work signals potential future directions, including using Vaegan as a predictor to bypass the HLS process and integrating diffusion models or data-flow graphs into the framework.

Abstract

High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.

Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning

TL;DR

The paper tackles the scarcity and fidelity of system-level HLS data for design space exploration by introducing Vaegan, a three-stage framework that converts diverse HLS data into fixed-point inputs, uses a VAE (MLPVAE) or DCGAN to generate synthetic data, and rigorously evaluates fidelity against real data with metrics such as , , , and . Across two FPGA parts and with/without HLS directives, MLPVAE consistently outperforms prior synthetic baselines and DCGAN in fidelity, while also offering faster training; a wearable-case study demonstrates Vaegan-generated data yielding Pareto frontiers closer to the original system than competing synthetic methods. Vaegan thus enables scalable, high-quality synthetic data for complex, multi-component HLS DSE, expanding the feasible design space for real-world embedded systems. The work signals potential future directions, including using Vaegan as a predictor to bypass the HLS process and integrating diffusion models or data-flow graphs into the framework.

Abstract

High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.
Paper Structure (14 sections, 2 equations, 4 figures, 4 tables)

This paper contains 14 sections, 2 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Proposed Vaegan approach to generating synthetic data for system-level HLS design space exploration compared to the traditional approach
  • Figure 2: Vaegan comprises three stages: (A) formatting and transforming diverse input data (HLS directives, HLS report estimation, post-synthesis, and post-implementation data) into a format readable by the network. (B) employing ML (here, a Variational Autoencoder (VAE) or a Generative Adversarial Network (GAN)) to generate synthetic HLS data. (C) evaluating the generated synthetic HLS data.
  • Figure 3: Visualized HLS data comparison between (a) real data, (b) Gaussian LiaoDSE2023, (c) ABC wharrie2022hapnest, (d) DCGAN, and (e) MLPVAE for part xc7v585tffg1157-3 without HLS directives
  • Figure 4: Pareto-optimal design points for the area (FF+LUT) and energy of the original three-component wearable pregnancy monitoring system compared with prior work (ABC and Gaussian) and Vaegan (MLVPAE and DCGAN).