Table of Contents
Fetching ...

Single-GPU GNN Systems: Traps and Pitfalls

Yidong Gong, Arnab Tarafder, Saima Afrin, Pradeep Kumar

TL;DR

This work investigates pervasive pitfalls in single-GPU GNN systems, showing that missing end-to-end accuracy, backward-computation misconceptions, framework overhead, and memory inefficiencies collectively distort performance claims. By systematically evaluating over 20 systems, the authors reveal how kernel-level optimizations can be overcredited when fundamental correctness is neglected. They propose a structured set of recommendations and a reference single-GPU GNN system designed around clear requirements, symmetry-aware storage, and native, well-ordered kernels to address these pitfalls. The results demonstrate practical improvements, including reduced memory footprints and the ability to train very large graphs on a single GPU, underscoring the importance of end-to-end evaluation and careful system design for credible progress in GNN systems research.

Abstract

The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting conclusions and lessons learned. We analyze many single-GPU systems and show the fundamental impact of these pitfalls. We further develop hypotheses, recommendations, and evaluation methodologies, and provide future directions. Finally, a new reference system is developed to establish a new line of optimizations rooted in solving the system-design pitfalls efficiently and practically. The proposed design can productively be integrated into prior works, thereby truly advancing the state-of-the-art.

Single-GPU GNN Systems: Traps and Pitfalls

TL;DR

This work investigates pervasive pitfalls in single-GPU GNN systems, showing that missing end-to-end accuracy, backward-computation misconceptions, framework overhead, and memory inefficiencies collectively distort performance claims. By systematically evaluating over 20 systems, the authors reveal how kernel-level optimizations can be overcredited when fundamental correctness is neglected. They propose a structured set of recommendations and a reference single-GPU GNN system designed around clear requirements, symmetry-aware storage, and native, well-ordered kernels to address these pitfalls. The results demonstrate practical improvements, including reduced memory footprints and the ability to train very large graphs on a single GPU, underscoring the importance of end-to-end evaluation and careful system design for credible progress in GNN systems research.

Abstract

The current graph neural network (GNN) systems have established a clear trend of not showing training accuracy results, and directly or indirectly relying on smaller datasets for evaluations majorly. Our in-depth analysis shows that it leads to a chain of pitfalls in the system design and evaluation process, questioning the practicality of many of the proposed system optimizations, and affecting conclusions and lessons learned. We analyze many single-GPU systems and show the fundamental impact of these pitfalls. We further develop hypotheses, recommendations, and evaluation methodologies, and provide future directions. Finally, a new reference system is developed to establish a new line of optimizations rooted in solving the system-design pitfalls efficiently and practically. The proposed design can productively be integrated into prior works, thereby truly advancing the state-of-the-art.
Paper Structure (28 sections, 1 equation, 11 figures, 1 table)

This paper contains 28 sections, 1 equation, 11 figures, 1 table.

Figures (11)

  • Figure 3: Accuracy comparison: TC-GNN does not provide GAT, hence AGNN is substituted here as both are attention-based(Class A GNN). We also discuss other works in the text. DGL is for reference.
  • Figure 4: DGL overhead ratio while training GCN for 200 epochs on graphs (vertex count = 32,768), where the X-axis is showing edge counts: Almost 100% overhead when edge-count is $<$ 4 Million.
  • Figure 5: Training time and framework overhead for DGL, PyG, GNNA (stands for GNNAdvisor), and dgNN for GNN models training for 200 epochs. We also measured Seastar and have almost 100% overhead for these datasets.
  • Figure 6: The data format layout in for sample graph of Fig. \ref{['fig-format']}(b): There is only one copy of the grayed array representing Column ID array, which is shared among COO, CSR, and CSC. The Offset array is shared between CSR and CSC, while the edge ID array is specific to CSC only.
  • Figure 7: Accuracy comparison of and DGL. For other systems, please refer back to Fig. \ref{['pitfall-accuracy']} and \ref{['sec.accuracy']}.
  • ...and 6 more figures