Cross-Modality Program Representation Learning for Electronic Design Automation with High-Level Synthesis
Zongyue Qin, Yunsheng Bai, Atefeh Sohrabizadeh, Zijian Ding, Ziniu Hu, Yizhou Sun, Jason Cong
TL;DR
ProgSG introduces a cross-modality program representation framework for electronic design automation, combining high-level synthesis source code with a hierarchical CDFG graph to predict design quality and accelerate design space exploration. It presents two interaction schemes—graph-summary augmented sequences (ProgSG-si) and fine-grained node-token cross-modality messaging—alongside a graph-focused pretraining regime on compiler data-flow tasks. Empirical results across MachSuite and Polybench kernels show ProgSG achieving up to 22% RMSE reduction and substantial design-space exploration speedups relative to baselines, with further gains from multi-version data and pretraining. The approach demonstrates that jointly leveraging code and graph information, together with targeted pretraining, can materially improve IC design automation and may generalize to other program-analysis tasks.
Abstract
In recent years, domain-specific accelerators (DSAs) have gained popularity for applications such as deep learning and autonomous driving. To facilitate DSA designs, programmers use high-level synthesis (HLS) to compile a high-level description written in C/C++ into a design with low-level hardware description languages that eventually synthesize DSAs on circuits. However, creating a high-quality HLS design still demands significant domain knowledge, particularly in microarchitecture decisions expressed as \textit{pragmas}. Thus, it is desirable to automate such decisions with the help of machine learning for predicting the quality of HLS designs, requiring a deeper understanding of the program that consists of original code and pragmas. Naturally, these programs can be considered as sequence data. In addition, these programs can be compiled and converted into a control data flow graph (CDFG). But existing works either fail to leverage both modalities or combine the two in shallow or coarse ways. We propose ProgSG, a model that allows interaction between the source code sequence modality and the graph modality in a deep and fine-grained way. To alleviate the scarcity of labeled designs, a pre-training method is proposed based on a suite of compiler's data flow analysis tasks. Experimental results show that ProgSG reduces the RMSE of design performance predictions by up to $22\%$, and identifies designs with an average of $1.10\times$ and $1.26\times$ (up to $8.17\times$ and $13.31\times$) performance improvement in design space exploration (DSE) task compared to HARP and AutoDSE, respectively.
