Learning Generalizable Program and Architecture Representations for Performance Modeling

Lingda Li; Thomas Flynn; Adolfy Hoisie

Learning Generalizable Program and Architecture Representations for Performance Modeling

Lingda Li, Thomas Flynn, Adolfy Hoisie

TL;DR

PerfVec tackles the challenge of generalizable performance modeling by learning independent, orthogonal representations for programs and microarchitectures and by composing a program’s representation from the representations of its executed instructions. The framework introduces a foundation model for instructions and uses microarchitecture sampling to train without a full architecture model, enabling rapid cross-architecture predictions via a simple dot product between program and microarchitecture representations. Key contributions include (1) a compositional representation scheme $\bm{R}_p = \sum_i \bm{R}_i$ with $T = \bm{R}_p \cdot \bm{M}$, (2) a scalable training strategy combining instruction representation reuse and microarchitecture sampling, and (3) demonstrations of strong accuracy and generality on unseen programs and architectures, plus practical applications in design space exploration and loop tiling analysis. The approach significantly reduces training costs and prediction latency while offering broad applicability, potentially transforming performance modeling workflows in HPC and systems design.

Abstract

Performance modeling is an essential tool in many areas, including performance characterization/optimization, design space exploration, and resource allocation problems, to name a few. However, existing performance modeling approaches have limitations, such as high computational cost for discrete-event simulators, narrow flexibility of hardware emulators, or restricted accuracy/generality of analytical/data-driven models. To address these limitations, this paper proposes PerfVec, a novel deep learning-based performance modeling framework that learns high-dimensional and independent/orthogonal program and microarchitecture representations. Once learned, a program representation can be used to predict its performance on any microarchitecture, and likewise, a microarchitecture representation can be applied in the performance prediction of any program. Additionally, PerfVec yields a foundation model that captures the performance essence of instructions, which can be directly used by developers in numerous performance modeling related tasks without incurring its training cost. The evaluation demonstrates that PerfVec is more general and efficient than previous approaches.

Learning Generalizable Program and Architecture Representations for Performance Modeling

TL;DR

with

, (2) a scalable training strategy combining instruction representation reuse and microarchitecture sampling, and (3) demonstrations of strong accuracy and generality on unseen programs and architectures, plus practical applications in design space exploration and loop tiling analysis. The approach significantly reduces training costs and prediction latency while offering broad applicability, potentially transforming performance modeling workflows in HPC and systems design.

Abstract

Paper Structure (21 sections, 1 equation, 8 figures, 4 tables)

This paper contains 21 sections, 1 equation, 8 figures, 4 tables.

Introduction
Toward Generalizable Performance Modeling
Learning Program Representations
Challenges
Compositional Instruction Representations
Instruction Features
Model Architecture
Training the Foundation Model
Microarchitecture Sampling
Instruction Representation Reuse
Dataset
Training Setup
Evaluation
Accuracy and Generality
Ablation Studies
...and 6 more sections

Figures (8)

Figure 1: The proposed PerfVec framework based on independent and orthogonal program and microarchitecture representations.
Figure 2: The framework to learn instruction representations.
Figure 3: Performance prediction accuracy for seen and unseen programs on seen microarchitectures.
Figure 4: Prediction accuracy on seen microarchitectures after moving 519.lbm to the training dataset.
Figure 5: Performance prediction accuracy for seen and unseen programs on unseen microarchitectures.
...and 3 more figures

Learning Generalizable Program and Architecture Representations for Performance Modeling

TL;DR

Abstract

Learning Generalizable Program and Architecture Representations for Performance Modeling

Authors

TL;DR

Abstract

Table of Contents

Figures (8)