GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Yidong Gong; Pradeep Kumar

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Yidong Gong, Pradeep Kumar

TL;DR

GnnBench addresses the lack of a standardized benchmark for single-GPU GNN systems by delivering a framework-agnostic benchmarking platform with stable System APIs and zero-copy tensor exchange via a Producer-Only DLPack protocol. It enables plug-and-play integration of diverse GNN kernels, automatically generates integration code through a DSL, and evaluates multiple existing systems to reveal accuracy pitfalls, framework overhead, and memory behavior. The experimental results show that many prior systems have accuracy and performance issues that are mitigated or clarified when benchmarked with GnnBench, and that framework overhead can dominate small-dataset runtimes, while mid-size datasets reveal true kernel performance. The work demonstrates the platform’s practicality and versatility across PyTorch and TensorFlow, offering a fair baseline for future GNN innovations and guiding decisions about kernel fusion versus native implementations.

Abstract

We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

TL;DR

Abstract

Paper Structure (25 sections, 18 equations, 6 figures, 3 tables)

This paper contains 25 sections, 18 equations, 6 figures, 3 tables.

Introduction
Background on GNN Computation
Goals, Challenges, and Related Works
Challenges and Related Works
Instability due to Graph Format
Instability due to Kernel Variants
DGL's Framework overhead and Complex Integration
Other Platforms
GnnBench Approach
Zero-Copy Tensor Exchange
Existing Producer-Consumer DLPack Protocol
Proposal: Producer-Only DLPack Protocol
Significance of Producer-Only DLPack Protocol
Overview of System APIs
System Integration and Extensibility using DSL
...and 10 more sections

Figures (6)

Figure 1: GnnBench-System is DL framework agnostic, while integration is done at Python level using Framework-Adapter. An independent GnnBench-System has no interface-level limitation.
Figure 2: Graph representation in DGL: it has introduced an edge ID to the graph.
Figure 3: Results show that GnnBench helps achieve the same accuracy as DGL so can be used for fair evaluations (higher is better)
Figure 4: Runtime evaluation on small datasets shows that GnnBench has lower framework overhead than DGL and dgNN (higher is better)
Figure 5: Runtime performance and memory comparison among GnnBench systems and DGL for GCN and GIN
...and 1 more figures

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

TL;DR

Abstract

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Authors

TL;DR

Abstract

Table of Contents

Figures (6)