Table of Contents
Fetching ...

LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence

Zhihao Shi, Xize Liang, Jie Wang

TL;DR

This work tackles the neighbor explosion issue in training MP-based GNNs on large graphs by introducing Local Message Compensation (LMC), the first subgraph-wise training method with provable convergence. LMC reconstructs discarded backward messages through a backward-pass message-passing formulation and applies forward and backward compensations to produce accurate minibatch gradients while keeping complexity independent of the full graph. Theoretical analysis shows that minibatch gradient biases can be made arbitrarily small and that the method converges to first-order stationary points under standard Lipschitz assumptions. Empirically, LMC matches full-batch performance, outperforms state-of-the-art subgraph samplers in efficiency, and remains robust to small batch sizes, aided by ablations confirming the critical role of backward compensation.

Abstract

The message passing-based graph neural networks (GNNs) have achieved great success in many real-world applications. However, training GNNs on large-scale graphs suffers from the well-known neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of message passing layers. Subgraph-wise sampling methods -- a promising class of mini-batch training techniques -- discard messages outside the mini-batches in backward passes to avoid the neighbor explosion problem at the expense of gradient estimation accuracy. This poses significant challenges to their convergence analysis and convergence speeds, which seriously limits their reliable real-world applications. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the {\it first} subgraph-wise sampling method with provable convergence. The key idea of LMC is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate mini-batch gradients and thus accelerates convergence. We further show that LMC converges to first-order stationary points of GNNs. Experiments on large-scale benchmark tasks demonstrate that LMC significantly outperforms state-of-the-art subgraph-wise sampling methods in terms of efficiency.

LMC: Fast Training of GNNs via Subgraph Sampling with Provable Convergence

TL;DR

This work tackles the neighbor explosion issue in training MP-based GNNs on large graphs by introducing Local Message Compensation (LMC), the first subgraph-wise training method with provable convergence. LMC reconstructs discarded backward messages through a backward-pass message-passing formulation and applies forward and backward compensations to produce accurate minibatch gradients while keeping complexity independent of the full graph. Theoretical analysis shows that minibatch gradient biases can be made arbitrarily small and that the method converges to first-order stationary points under standard Lipschitz assumptions. Empirically, LMC matches full-batch performance, outperforms state-of-the-art subgraph samplers in efficiency, and remains robust to small batch sizes, aided by ablations confirming the critical role of backward compensation.

Abstract

The message passing-based graph neural networks (GNNs) have achieved great success in many real-world applications. However, training GNNs on large-scale graphs suffers from the well-known neighbor explosion problem, i.e., the exponentially increasing dependencies of nodes with the number of message passing layers. Subgraph-wise sampling methods -- a promising class of mini-batch training techniques -- discard messages outside the mini-batches in backward passes to avoid the neighbor explosion problem at the expense of gradient estimation accuracy. This poses significant challenges to their convergence analysis and convergence speeds, which seriously limits their reliable real-world applications. To address this challenge, we propose a novel subgraph-wise sampling method with a convergence guarantee, namely Local Message Compensation (LMC). To the best of our knowledge, LMC is the {\it first} subgraph-wise sampling method with provable convergence. The key idea of LMC is to retrieve the discarded messages in backward passes based on a message passing formulation of backward passes. By efficient and effective compensations for the discarded messages in both forward and backward passes, LMC computes accurate mini-batch gradients and thus accelerates convergence. We further show that LMC converges to first-order stationary points of GNNs. Experiments on large-scale benchmark tasks demonstrate that LMC significantly outperforms state-of-the-art subgraph-wise sampling methods in terms of efficiency.
Paper Structure (41 sections, 18 theorems, 121 equations, 6 figures, 9 tables, 2 algorithms)

This paper contains 41 sections, 18 theorems, 121 equations, 6 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Suppose that a mini-batch $\mathcal{V}_{\mathcal{B}}$ is uniformly sampled from $\mathcal{V}$ and the corresponding labeled nodes $\mathcal{V}_{L_{\mathcal{B}}} = \mathcal{V}_{\mathcal{B}} \cap \mathcal{V}_{L}$ is uniformly sampled from $\mathcal{V}_{L}$. Then the mini-batch gradients $\mathbf{g}_w(

Figures (6)

  • Figure 1: Comparison of LMC with GNNAutoScale (GAS) gas. (a) shows the original graph with in-batch nodes, 1-hop out-of-batch nodes, and other out-of-batch nodes in orange, blue, and grey, respectively. (b) and (d) show the computation graphs of forward passes and backward passes of GAS, respectively. (c) and (e) show the computation graphs of forward passes and backward passes of LMC, respectively.
  • Figure 2: Testing accuracy and training loss w.r.t. runtimes (s).
  • Figure 3: The average relative estimated errors of mini-batch gradients computed by CLUSTER, GAS, and LMC for GCN models.
  • Figure 4: The improvement of the compensations on the Ogbn-arxiv dataset.
  • Figure 5: Testing accuracy w.r.t. runtimes (s).
  • ...and 1 more figures

Theorems & Definitions (33)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • ...and 23 more