VISAGNN: Versatile Staleness-Aware Efficient Training on Large-Scale Graphs
Rui Xue
TL;DR
This work tackles the staleness bottleneck in training large-scale GNNs with historical embeddings, which arise when out-of-batch neighbor information is approximated by stale embeddings. VISAGNN introduces three complementary components—Dynamic Staleness Attention, Staleness-aware Loss, and Staleness-Augmented Embeddings—to adaptively mitigate the negative impact of staleness during forward and backward passes, using a staleness criterion $s_j$ and a time-varying factor $\gamma(t) = \beta / t$. The authors provide theoretical analysis linking per-layer staleness to final embedding error, and prove convergence under the proposed mechanism; they also demonstrate strong empirical gains on large-scale datasets (e.g., ogbn-arxiv, ogbn-products, ogbn-papers100M, MAG240M) with favorable memory and convergence characteristics. Overall, VISAGNN offers a versatile, efficient, and principled framework that can enhance existing historical-embedding approaches and sampling strategies for scalable GNN training.
Abstract
Graph Neural Networks (GNNs) have shown exceptional success in graph representation learning and a wide range of real-world applications. However, scaling deeper GNNs poses challenges due to the neighbor explosion problem when training on large-scale graphs. To mitigate this, a promising class of GNN training algorithms utilizes historical embeddings to reduce computation and memory costs while preserving the expressiveness of the model. These methods leverage historical embeddings for out-of-batch nodes, effectively approximating full-batch training without losing any neighbor information-a limitation found in traditional sampling methods. However, the staleness of these historical embeddings often introduces significant bias, acting as a bottleneck that can adversely affect model performance. In this paper, we propose a novel VersatIle Staleness-Aware GNN, named VISAGNN, which dynamically and adaptively incorporates staleness criteria into the large-scale GNN training process. By embedding staleness into the message passing mechanism, loss function, and historical embeddings during training, our approach enables the model to adaptively mitigate the negative effects of stale embeddings, thereby reducing estimation errors and enhancing downstream accuracy. Comprehensive experiments demonstrate the effectiveness of our method in overcoming the staleness issue of existing historical embedding techniques, showcasing its superior performance and efficiency on large-scale benchmarks, along with significantly faster convergence.
