Table of Contents
Fetching ...

Accurate and Scalable Graph Neural Networks via Message Invariance

Zhihao Shi, Jie Wang, Zhiwei Zhuang, Xize Liang, Bin Li, Feng Wu

TL;DR

This work addresses the computational blow-up in multi-layer GNNs caused by recursive MP_OB in mini-batch settings. It introduces TOP, a topological compensation framework that exploits message invariance to replace expensive MP_OB with a fast MP_IB via a learned linear transformation R, approximating out-of-batch messages from in-batch embeddings. The authors provide theoretical convergence guarantees and demonstrate through extensive experiments that TOP achieves near full-batch accuracy with order-of-magnitude speedups and lower memory usage on large-scale graphs, outperforming existing subgraph, node-wise, and layer-wise sampling methods. The approach is practically impactful for scalable GNN training on graphs with millions of nodes and billions of edges, with strong empirical performance across diverse datasets and backbones.

Abstract

Message passing-based graph neural networks (GNNs) have achieved great success in many real-world applications. For a sampled mini-batch of target nodes, the message passing process is divided into two parts: message passing between nodes within the batch (MP-IB) and message passing from nodes outside the batch to those within it (MP-OB). However, MP-OB recursively relies on higher-order out-of-batch neighbors, leading to an exponentially growing computational cost with respect to the number of layers. Due to the neighbor explosion, the whole message passing stores most nodes and edges on the GPU such that many GNNs are infeasible to large-scale graphs. To address this challenge, we propose an accurate and fast mini-batch approach for large graph transductive learning, namely topological compensation (TOP), which obtains the outputs of the whole message passing solely through MP-IB, without the costly MP-OB. The major pillar of TOP is a novel concept of message invariance, which defines message-invariant transformations to convert costly MP-OB into fast MP-IB. This ensures that the modified MP-IB has the same output as the whole message passing. Experiments demonstrate that TOP is significantly faster than existing mini-batch methods by order of magnitude on vast graphs (millions of nodes and billions of edges) with limited accuracy degradation.

Accurate and Scalable Graph Neural Networks via Message Invariance

TL;DR

This work addresses the computational blow-up in multi-layer GNNs caused by recursive MP_OB in mini-batch settings. It introduces TOP, a topological compensation framework that exploits message invariance to replace expensive MP_OB with a fast MP_IB via a learned linear transformation R, approximating out-of-batch messages from in-batch embeddings. The authors provide theoretical convergence guarantees and demonstrate through extensive experiments that TOP achieves near full-batch accuracy with order-of-magnitude speedups and lower memory usage on large-scale graphs, outperforming existing subgraph, node-wise, and layer-wise sampling methods. The approach is practically impactful for scalable GNN training on graphs with millions of nodes and billions of edges, with strong empirical performance across diverse datasets and backbones.

Abstract

Message passing-based graph neural networks (GNNs) have achieved great success in many real-world applications. For a sampled mini-batch of target nodes, the message passing process is divided into two parts: message passing between nodes within the batch (MP-IB) and message passing from nodes outside the batch to those within it (MP-OB). However, MP-OB recursively relies on higher-order out-of-batch neighbors, leading to an exponentially growing computational cost with respect to the number of layers. Due to the neighbor explosion, the whole message passing stores most nodes and edges on the GPU such that many GNNs are infeasible to large-scale graphs. To address this challenge, we propose an accurate and fast mini-batch approach for large graph transductive learning, namely topological compensation (TOP), which obtains the outputs of the whole message passing solely through MP-IB, without the costly MP-OB. The major pillar of TOP is a novel concept of message invariance, which defines message-invariant transformations to convert costly MP-OB into fast MP-IB. This ensures that the modified MP-IB has the same output as the whole message passing. Experiments demonstrate that TOP is significantly faster than existing mini-batch methods by order of magnitude on vast graphs (millions of nodes and billions of edges) with limited accuracy degradation.

Paper Structure

This paper contains 42 sections, 5 theorems, 62 equations, 9 figures, 8 tables, 2 algorithms.

Key Result

Theorem 5.1

Let $\mathcal{L}(\mathcal{W}) = \sum_{i \in \mathcal{V}} \ell (\mathbf{h}^{(L)}_{i}, y_i) / |\mathcal{B}|$ and $\mathbf{d}_{\mathcal{W}} = \nabla_{\mathcal{W}} \sum_{i \in \mathcal{B}} \ell (\mathbf{h}^{(L,TOP)}_{i}, y_i) / |\mathcal{B}|$ be the loss of the full-batch method and the gradient of TOP

Figures (9)

  • Figure 1: Mini-batch processing of original GNNs, subgraph sampling, and TOP. Given a mini-batch, the computational costs of original GNNs exponentially increase with GNN depth (a). To address this challenge, many subgraph sampling methods preserve message passing between the in-batch nodes ($\text{MP}_{\text{IB}}$) and eliminate message passing from out-of-batch neighbors to the in-batch nodes ($\text{MP}_{\text{OB}}$) to reduce the computational costs (b). However, the final embeddings of subgraph sampling are usually different from the result of the original GNNs. By noticing the message invariance $\mathbf{h}_{4} = 0 \cdot \mathbf{h}_{1} + 0 \cdot \mathbf{h}_{2} + 1 \cdot \mathbf{h}_{3}$, TOP converts $\text{MP}_{\text{OB}}$$v_4\rightarrow v_3$ into $\text{MP}_{\text{IB}}$$v_3\rightarrow v_3$ without approximation errors in the example (c).
  • Figure 2: Measuring the message invariance in real-world datasets. The output of TOP is very close to the whole message passing (denoted by Full-batch). Please refer to Table \ref{['tab:ams']} in Appendix \ref{['sec:ams_exp']} for more results.
  • Figure 3: Convergence curves (test accuracy vs. runtime (s)) of subgraph sampling. We use the default $|\mathcal{B}|$ and $|\mathcal{V}|$---which denote the sizes of subgraphs and the whole graph respectively---provided in GAS gas.
  • Figure 4: Relative runtime per epoch and relative memory consumption. Please refer to Table \ref{['tab:memory_consumption']} in Appendix \ref{['sec:relative_mc']} for more results.
  • Figure 5: Memory consumption and convergence curves of TOP and node/layer-wise sampling.
  • ...and 4 more figures

Theorems & Definitions (13)

  • Definition 4.1: Message invariance
  • Theorem 5.1
  • Theorem D.1
  • proof
  • proof
  • Lemma E.1
  • proof
  • Definition E.2
  • Theorem E.3
  • proof
  • ...and 3 more