SpanGNN: Towards Memory-Efficient Graph Neural Networks via Spanning Subgraph Training
Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao
TL;DR
SpanGNN tackles the memory bottleneck of full-graph GNN training by using a sequence of spanning subgraphs and incrementally updating edges under an upper memory bound $\alpha_{up}$. It introduces fast quality-aware edge selection with variance-minimized and gradient-noise reduced sampling, plus a two-step sampling scheme to scale to large graphs, aligning training with curriculum-learning principles. Empirical results on large datasets show substantial peak-memory reductions (often $>40\%$) with accuracy close to full-graph training, and competitive performance versus mini-batch methods. This approach enables scalable, high-accuracy GNN training on very large graphs where traditional full-graph or mini-batch methods struggle.
Abstract
Graph Neural Networks (GNNs) have superior capability in learning graph data. Full-graph GNN training generally has high accuracy, however, it suffers from large peak memory usage and encounters the Out-of-Memory problem when handling large graphs. To address this memory problem, a popular solution is mini-batch GNN training. However, mini-batch GNN training increases the training variance and sacrifices the model accuracy. In this paper, we propose a new memory-efficient GNN training method using spanning subgraph, called SpanGNN. SpanGNN trains GNN models over a sequence of spanning subgraphs, which are constructed from empty structure. To overcome the excessive peak memory consumption problem, SpanGNN selects a set of edges from the original graph to incrementally update the spanning subgraph between every epoch. To ensure the model accuracy, we introduce two types of edge sampling strategies (i.e., variance-reduced and noise-reduced), and help SpanGNN select high-quality edges for the GNN learning. We conduct experiments with SpanGNN on widely used datasets, demonstrating SpanGNN's advantages in the model performance and low peak memory usage.
