Table of Contents
Fetching ...

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

Jiaqi Wang, Jingwei Sun, Jiyu Luo, Han Li, Guangzhong Sun

TL;DR

GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs, captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup.

Abstract

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive sampling with high errors or conservative sampling with constrained speedups. To address these issues, we propose GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs. By encoding instruction sequences and data dependencies into graph embeddings, GCL-Sampler captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup. Evaluations on extensive benchmarks show that GCL-Sampler achieves 258.94x average speedup against full workload with 0.37% error, outperforming state-of-the-art methods, PKA (129.23x, 20.90%), Sieve (94.90x, 4.10%) and STEM+ROOT (56.57x, 0.38%).

GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

TL;DR

GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs, captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup.

Abstract

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive sampling with high errors or conservative sampling with constrained speedups. To address these issues, we propose GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs. By encoding instruction sequences and data dependencies into graph embeddings, GCL-Sampler captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup. Evaluations on extensive benchmarks show that GCL-Sampler achieves 258.94x average speedup against full workload with 0.37% error, outperforming state-of-the-art methods, PKA (129.23x, 20.90%), Sieve (94.90x, 4.10%) and STEM+ROOT (56.57x, 0.38%).
Paper Structure (21 sections, 6 equations, 6 figures, 3 tables)

This paper contains 21 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: GCL-Sampler achieves near-ideal performance (red star) with minimal error and maximum speedup compared to existing methods.
  • Figure 2: Overview of GCL-Sampler. GCL-Sampler transforms program traces into HRGs and leverages RGCN-based contrastive learning to generate kernel embeddings, followed by K-Means clustering.
  • Figure 3: An example of graph construction from traces.
  • Figure 4: Sampling error of GCL-Sampler, PKA, Sieve and STEM+ROOT.
  • Figure 5: Speedup of GCL-Sampler, PKA, Sieve and STEM+ROOT.
  • ...and 1 more figures