Table of Contents
Fetching ...

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

Zongyue Qin, Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong

TL;DR

ProgSG introduces a cross-modality program representation framework for electronic design automation, combining high-level synthesis source code with a hierarchical CDFG graph to predict design quality and accelerate design space exploration. It presents two interaction schemes—graph-summary augmented sequences (ProgSG-si) and fine-grained node-token cross-modality messaging—alongside a graph-focused pretraining regime on compiler data-flow tasks. Empirical results across MachSuite and Polybench kernels show ProgSG achieving up to 22% RMSE reduction and substantial design-space exploration speedups relative to baselines, with further gains from multi-version data and pretraining. The approach demonstrates that jointly leveraging code and graph information, together with targeted pretraining, can materially improve IC design automation and may generalize to other program-analysis tasks.

Abstract

In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as \textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to $22\%$, and identifies designs with an average of $1.10\times$ and $1.26\times$ (up to $8.17\times$ and $13.31\times$) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.

Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis

TL;DR

ProgSG introduces a cross-modality program representation framework for electronic design automation, combining high-level synthesis source code with a hierarchical CDFG graph to predict design quality and accelerate design space exploration. It presents two interaction schemes—graph-summary augmented sequences (ProgSG-si) and fine-grained node-token cross-modality messaging—alongside a graph-focused pretraining regime on compiler data-flow tasks. Empirical results across MachSuite and Polybench kernels show ProgSG achieving up to 22% RMSE reduction and substantial design-space exploration speedups relative to baselines, with further gains from multi-version data and pretraining. The approach demonstrates that jointly leveraging code and graph information, together with targeted pretraining, can materially improve IC design automation and may generalize to other program-analysis tasks.

Abstract

In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as \textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to , and identifies designs with an average of and (up to and ) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.
Paper Structure (25 sections, 4 equations, 8 figures, 7 tables)

This paper contains 25 sections, 4 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: An Illustration of HARP control data flow graph. Compared with a normal CDFG, it has additional block nodes and three types of edges: intra-block edges, block-flow edges, and hierarchy-level edges.
  • Figure 2: The overall diagrams of ProgSG. "GNN", "TF", and "Dec" refer to Graph Neural Network Layer, Transformer Layer, and Decoder, respectively.
  • Figure 3: Illustration of the node-token message passing mechanism. The cross-modality information is first exchanged via block nodes and block tokens. Then the information is propagated to normal nodes and tokens through the GNN and transformer layers, respectively.
  • Figure 4: Relative performance improvement of best design found by our model compared to running AutoDSE for twenty-five hours.
  • Figure 5: Bar plots of the average attention scores of pragma-related tokens before (ProgSG-rand) and after (ProgSG ) being fine-tuned.
  • ...and 3 more figures