Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks
Zhengjia Xu, Dingyang Lyu, Jinghui Zhang
TL;DR
SliceGCN tackles the memory bottleneck of full-batch GCN training on large graphs by distributing node feature slices across $p$ GPUs, allowing each device to process the complete graph with reduced per-device feature dimensionality. It introduces two stabilizing techniques, feature fusion and slice encoding, to mitigate potential accuracy loss from feature slicing while maintaining full-batch-like training behavior. Empirical results on six node-classification datasets show SliceGCN achieves comparable or better accuracy and, notably on large graphs, higher throughput with fewer parameters, indicating potential parameter efficiency. The approach offers a scalable alternative to sampling-based methods and broad applicability to distributed deep learning beyond GNNs.
Abstract
As graphs grow larger, full-batch GNN training becomes hard for single GPU memory. Therefore, to enhance the scalability of GNN training, some studies have proposed sampling-based mini-batch training and distributed graph learning. However, these methods still have drawbacks, such as performance degradation and heavy communication. This paper introduces SliceGCN, a feature-sliced distributed large-scale graph learning method. SliceGCN slices the node features, with each computing device, i.e., GPU, handling partial features. After each GPU processes its share, partial representations are obtained and concatenated to form complete representations, enabling a single GPU's memory to handle the entire graph structure. This aims to avoid the accuracy loss typically associated with mini-batch training (due to incomplete graph structures) and to reduce inter-GPU communication during message passing (the forward propagation process of GNNs). To study and mitigate potential accuracy reductions due to slicing features, this paper proposes feature fusion and slice encoding. Experiments were conducted on six node classification datasets, yielding some interesting analytical results. These results indicate that while SliceGCN does not enhance efficiency on smaller datasets, it does improve efficiency on larger datasets. Additionally, we found that SliceGCN and its variants have better convergence, feature fusion and slice encoding can make training more stable, reduce accuracy fluctuations, and this study also discovered that the design of SliceGCN has a potentially parameter-efficient nature.
