GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

Jiaqi Wang; Jingwei Sun; Jiyu Luo; Han Li; Guangzhong Sun

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

Jiaqi Wang, Jingwei Sun, Jiyu Luo, Han Li, Guangzhong Sun

TL;DR

GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs, captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup.

Abstract

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive sampling with high errors or conservative sampling with constrained speedups. To address these issues, we propose GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs. By encoding instruction sequences and data dependencies into graph embeddings, GCL-Sampler captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup. Evaluations on extensive benchmarks show that GCL-Sampler achieves 258.94x average speedup against full workload with 0.37% error, outperforming state-of-the-art methods, PKA (129.23x, 20.90%), Sieve (94.90x, 4.10%) and STEM+ROOT (56.57x, 0.38%).

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

TL;DR

Abstract

Paper Structure (21 sections, 6 equations, 6 figures, 3 tables)

This paper contains 21 sections, 6 equations, 6 figures, 3 tables.

Introduction
Background
SASS Trace
RGCN and Contrastive Learning
GPU Simulation
The GCL-Sampler Methodology
Application Tracing
Graph Construction from Traces
RGCN Contrastive Learning
Data Preprocessing
RGCN Architecture
Training Loss
Embedding Generation and Clustering
Experiment Setup
Evaluation
...and 6 more sections

Figures (6)

Figure 1: GCL-Sampler achieves near-ideal performance (red star) with minimal error and maximum speedup compared to existing methods.
Figure 2: Overview of GCL-Sampler. GCL-Sampler transforms program traces into HRGs and leverages RGCN-based contrastive learning to generate kernel embeddings, followed by K-Means clustering.
Figure 3: An example of graph construction from traces.
Figure 4: Sampling error of GCL-Sampler, PKA, Sieve and STEM+ROOT.
Figure 5: Speedup of GCL-Sampler, PKA, Sieve and STEM+ROOT.
...and 1 more figures

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

TL;DR

Abstract

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)