Table of Contents
Fetching ...

A Unified Framework for Automated Code Transformation and Pragma Insertion

Stéphane Pouget, Louis-Noël Pouchet, Jason Cong

TL;DR

Sisyphus presents a unified NLP-based framework that jointly optimizes code transformations, tile-size selection for on-chip data caching, and hardware pragma insertion within HLS for affine loop kernels. By constructing a single optimization problem and using a template-driven approach, it navigates a large, legally constrained design space far more efficiently than prior DSE methods, achieving QoR that matches or surpasses AutoDSE, NLP-DSE, ScaleHLS, and HARP across diverse benchmarks. The approach demonstrates substantial throughput gains (GF/s) on gemm, CNN, and Bert-like tasks, along with favorable resource usage and a scalable latency model with reasonable prediction error. The work highlights the practical impact of a unified, explainable design space exploration framework for FPGA-targeted HLS, enabling rapid generation of high-quality designs with minimal manual intervention.

Abstract

High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.

A Unified Framework for Automated Code Transformation and Pragma Insertion

TL;DR

Sisyphus presents a unified NLP-based framework that jointly optimizes code transformations, tile-size selection for on-chip data caching, and hardware pragma insertion within HLS for affine loop kernels. By constructing a single optimization problem and using a template-driven approach, it navigates a large, legally constrained design space far more efficiently than prior DSE methods, achieving QoR that matches or surpasses AutoDSE, NLP-DSE, ScaleHLS, and HARP across diverse benchmarks. The approach demonstrates substantial throughput gains (GF/s) on gemm, CNN, and Bert-like tasks, along with favorable resource usage and a scalable latency model with reasonable prediction error. The work highlights the practical impact of a unified, explainable design space exploration framework for FPGA-targeted HLS, enabling rapid generation of high-quality designs with minimal manual intervention.

Abstract

High-level synthesis, source-to-source compilers, and various Design Space Exploration techniques for pragma insertion have significantly improved the Quality of Results of generated designs. These tools offer benefits such as reduced development time and enhanced performance. However, achieving high-quality results often requires additional manual code transformations and tiling selections, which are typically performed separately or as pre-processing steps. Although DSE techniques enable code transformation upfront, the vastness of the search space often limits the exploration of all possible code transformations, making it challenging to determine which transformations are necessary. Additionally, ensuring correctness remains challenging, especially for complex transformations and optimizations. To tackle this obstacle, we first propose a comprehensive framework leveraging HLS compilers. Our system streamlines code transformation, pragma insertion, and tiles size selection for on-chip data caching through a unified optimization problem, aiming to enhance parallelization, particularly beneficial for computation-bound kernels. Them employing a novel Non-Linear Programming (NLP) approach, we simultaneously ascertain transformations, pragmas, and tile sizes, focusing on regular loop-based kernels. Our evaluation demonstrates that our framework adeptly identifies the appropriate transformations, including scenarios where no transformation is necessary, and inserts pragmas to achieve a favorable Quality of Results.
Paper Structure (24 sections, 7 equations, 3 figures, 7 tables)