Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks

Zhengjia Xu; Dingyang Lyu; Jinghui Zhang

Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks

Zhengjia Xu, Dingyang Lyu, Jinghui Zhang

TL;DR

SliceGCN tackles the memory bottleneck of full-batch GCN training on large graphs by distributing node feature slices across $p$ GPUs, allowing each device to process the complete graph with reduced per-device feature dimensionality. It introduces two stabilizing techniques, feature fusion and slice encoding, to mitigate potential accuracy loss from feature slicing while maintaining full-batch-like training behavior. Empirical results on six node-classification datasets show SliceGCN achieves comparable or better accuracy and, notably on large graphs, higher throughput with fewer parameters, indicating potential parameter efficiency. The approach offers a scalable alternative to sampling-based methods and broad applicability to distributed deep learning beyond GNNs.

Abstract

As graphs grow larger, full-batch GNN training becomes hard for single GPU memory. Therefore, to enhance the scalability of GNN training, some studies have proposed sampling-based mini-batch training and distributed graph learning. However, these methods still have drawbacks, such as performance degradation and heavy communication. This paper introduces SliceGCN, a feature-sliced distributed large-scale graph learning method. SliceGCN slices the node features, with each computing device, i.e., GPU, handling partial features. After each GPU processes its share, partial representations are obtained and concatenated to form complete representations, enabling a single GPU's memory to handle the entire graph structure. This aims to avoid the accuracy loss typically associated with mini-batch training (due to incomplete graph structures) and to reduce inter-GPU communication during message passing (the forward propagation process of GNNs). To study and mitigate potential accuracy reductions due to slicing features, this paper proposes feature fusion and slice encoding. Experiments were conducted on six node classification datasets, yielding some interesting analytical results. These results indicate that while SliceGCN does not enhance efficiency on smaller datasets, it does improve efficiency on larger datasets. Additionally, we found that SliceGCN and its variants have better convergence, feature fusion and slice encoding can make training more stable, reduce accuracy fluctuations, and this study also discovered that the design of SliceGCN has a potentially parameter-efficient nature.

Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks

TL;DR

SliceGCN tackles the memory bottleneck of full-batch GCN training on large graphs by distributing node feature slices across

GPUs, allowing each device to process the complete graph with reduced per-device feature dimensionality. It introduces two stabilizing techniques, feature fusion and slice encoding, to mitigate potential accuracy loss from feature slicing while maintaining full-batch-like training behavior. Empirical results on six node-classification datasets show SliceGCN achieves comparable or better accuracy and, notably on large graphs, higher throughput with fewer parameters, indicating potential parameter efficiency. The approach offers a scalable alternative to sampling-based methods and broad applicability to distributed deep learning beyond GNNs.

Abstract

Paper Structure (15 sections, 10 equations, 5 figures, 4 tables, 3 algorithms)

This paper contains 15 sections, 10 equations, 5 figures, 4 tables, 3 algorithms.

Introduction
Related Work
Preliminaries
Attributed Graph
Graph Convolutional Network
Distributed GCN Training Task
SliceGCN
Feature Slicing and Model Initialization
Slice Encoding and Model Ouput
Experiments
Datasets
Compared Methods
Experimental Settings
Analysis
Conclusion, Limitations, and Future Research

Figures (5)

Figure 1: A simple example of the memory shortage issue in GNN full-batch learning.
Figure 2: Distributed graph learning with subgraph partitioning.
Figure 3: The framework of SliceGCN.
Figure 4: Validation accuracy of GCN, SliceGCN, and its variants on different datasets with $p=3$.
Figure 5: Validation accuracy of GCN, SliceGCN, and its variants on different datasets with $p=2$.

Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks

TL;DR

Abstract

Slicing Input Features to Accelerate Deep Learning: A Case Study with Graph Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)