Table of Contents
Fetching ...

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

Yidong Gong, Pradeep Kumar

TL;DR

GnnBench addresses the lack of a standardized benchmark for single-GPU GNN systems by delivering a framework-agnostic benchmarking platform with stable System APIs and zero-copy tensor exchange via a Producer-Only DLPack protocol. It enables plug-and-play integration of diverse GNN kernels, automatically generates integration code through a DSL, and evaluates multiple existing systems to reveal accuracy pitfalls, framework overhead, and memory behavior. The experimental results show that many prior systems have accuracy and performance issues that are mitigated or clarified when benchmarked with GnnBench, and that framework overhead can dominate small-dataset runtimes, while mid-size datasets reveal true kernel performance. The work demonstrates the platform’s practicality and versatility across PyTorch and TensorFlow, offering a fair baseline for future GNN innovations and guiding decisions about kernel fusion versus native implementations.

Abstract

We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.

GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System

TL;DR

GnnBench addresses the lack of a standardized benchmark for single-GPU GNN systems by delivering a framework-agnostic benchmarking platform with stable System APIs and zero-copy tensor exchange via a Producer-Only DLPack protocol. It enables plug-and-play integration of diverse GNN kernels, automatically generates integration code through a DSL, and evaluates multiple existing systems to reveal accuracy pitfalls, framework overhead, and memory behavior. The experimental results show that many prior systems have accuracy and performance issues that are mitigated or clarified when benchmarked with GnnBench, and that framework overhead can dominate small-dataset runtimes, while mid-size datasets reveal true kernel performance. The work demonstrates the platform’s practicality and versatility across PyTorch and TensorFlow, offering a fair baseline for future GNN innovations and guiding decisions about kernel fusion versus native implementations.

Abstract

We hypothesize that the absence of a standardized benchmark has allowed several fundamental pitfalls in GNN System design and evaluation that the community has overlooked. In this work, we propose GNNBench, a plug-and-play benchmarking platform focused on system innovation. GNNBench presents a new protocol to exchange their captive tensor data, supports custom classes in System APIs, and allows automatic integration of the same system module to many deep learning frameworks, such as PyTorch and TensorFlow. To demonstrate the importance of such a benchmark framework, we integrated several GNN systems. Our results show that integration with GNNBench helped us identify several measurement issues that deserve attention from the community.
Paper Structure (25 sections, 18 equations, 6 figures, 3 tables)

This paper contains 25 sections, 18 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: GnnBench-System is DL framework agnostic, while integration is done at Python level using Framework-Adapter. An independent GnnBench-System has no interface-level limitation.
  • Figure 2: Graph representation in DGL: it has introduced an edge ID to the graph.
  • Figure 3: Results show that GnnBench helps achieve the same accuracy as DGL so can be used for fair evaluations (higher is better)
  • Figure 4: Runtime evaluation on small datasets shows that GnnBench has lower framework overhead than DGL and dgNN (higher is better)
  • Figure 5: Runtime performance and memory comparison among GnnBench systems and DGL for GCN and GIN
  • ...and 1 more figures