Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning
Yuchao Liao, Tosiron Adegbija, Roman Lysecky, Ravi Tandon
TL;DR
The paper tackles the scarcity and fidelity of system-level HLS data for design space exploration by introducing Vaegan, a three-stage framework that converts diverse HLS data into fixed-point inputs, uses a VAE (MLPVAE) or DCGAN to generate synthetic data, and rigorously evaluates fidelity against real data with metrics such as $MMD$, $SSD$, $PRD$, and $COSS$. Across two FPGA parts and with/without HLS directives, MLPVAE consistently outperforms prior synthetic baselines and DCGAN in fidelity, while also offering faster training; a wearable-case study demonstrates Vaegan-generated data yielding Pareto frontiers closer to the original system than competing synthetic methods. Vaegan thus enables scalable, high-quality synthetic data for complex, multi-component HLS DSE, expanding the feasible design space for real-world embedded systems. The work signals potential future directions, including using Vaegan as a predictor to bypass the HLS process and integrating diffusion models or data-flow graphs into the framework.
Abstract
High-Level Synthesis (HLS) Design Space Exploration (DSE) is a widely accepted approach for efficiently exploring Pareto-optimal and optimal hardware solutions during the HLS process. Several HLS benchmarks and datasets are available for the research community to evaluate their methodologies. Unfortunately, these resources are limited and may not be sufficient for complex, multi-component system-level explorations. Generating new data using existing HLS benchmarks can be cumbersome, given the expertise and time required to effectively generate data for different HLS designs and directives. As a result, synthetic data has been used in prior work to evaluate system-level HLS DSE. However, the fidelity of the synthetic data to real data is often unclear, leading to uncertainty about the quality of system-level HLS DSE. This paper proposes a novel approach, called Vaegan, that employs generative machine learning to generate synthetic data that is robust enough to support complex system-level HLS DSE experiments that would be unattainable with only the currently available data. We explore and adapt a Variational Autoencoder (VAE) and Generative Adversarial Network (GAN) for this task and evaluate our approach using state-of-the-art datasets and metrics. We compare our approach to prior works and show that Vaegan effectively generates synthetic HLS data that closely mirrors the ground truth's distribution.
