Table of Contents
Fetching ...

FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling

Jie Chen, Tengfei Ma, Cao Xiao

TL;DR

Speed and scalability limits in traditional GCNs motivate a shift to inductive learning via sampling. FastGCN treats graph convolutions as integral transforms under a distribution, enabling batched, vertex-based Monte Carlo estimation with variance-reduction via importance sampling. The approach yields orders-of-magnitude faster training compared with GCN and GraphSAGE while preserving predictive performance. The work also establishes a pathway for extending Monte Carlo, variance-reduction techniques to broader graph neural networks.

Abstract

The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work---FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.

FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling

TL;DR

Speed and scalability limits in traditional GCNs motivate a shift to inductive learning via sampling. FastGCN treats graph convolutions as integral transforms under a distribution, enabling batched, vertex-based Monte Carlo estimation with variance-reduction via importance sampling. The approach yields orders-of-magnitude faster training compared with GCN and GraphSAGE while preserving predictive performance. The work also establishes a pathway for extending Monte Carlo, variance-reduction techniques to broader graph neural networks.

Abstract

The graph convolutional networks (GCN) recently proposed by Kipf and Welling are an effective graph model for semi-supervised learning. This model, however, was originally designed to be learned with the presence of both training and test data. Moreover, the recursive neighborhood expansion across layers poses time and memory challenges for training with large, dense graphs. To relax the requirement of simultaneous availability of test data, we interpret graph convolutions as integral transforms of embedding functions under probability measures. Such an interpretation allows for the use of Monte Carlo approaches to consistently estimate the integrals, which in turn leads to a batched training scheme as we propose in this work---FastGCN. Enhanced with importance sampling, FastGCN not only is efficient for training but also generalizes well for inference. We show a comprehensive set of experiments to demonstrate its effectiveness compared with GCN and related models. In particular, training is orders of magnitude more efficient while predictions remain comparably accurate.

Paper Structure

This paper contains 16 sections, 5 theorems, 44 equations, 4 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

If $g$ and $\sigma$ are continuous, then

Figures (4)

  • Figure 1: Two views of GCN. On the left (graph convolution view), each circle represents a graph vertex. On two consecutive rows, a circle $i$ is connected (in gray line) with circle $j$ if the two corresponding vertices in the graph are connected. A convolution layer uses the graph connectivity structure to mix the vertex features/embeddings. On the right (integral transform view), the embedding function in the next layer is an integral transform (illustrated by the orange fanout shape) of the one in the previous layer. For the proposed method, all integrals (including the loss function) are evaluated by using Monte Carlo sampling. Correspondingly in the graph view, vertices are subsampled in a bootstrapping manner in each layer to approximate the convolution. The sampled portions are collectively denoted by the solid blue circles and the orange lines.
  • Figure 2: Benefit of precomputing $\hat{A}H^{(0)}$ for the input layer. Data set: Pubmed. Training time is in seconds, per-epoch (batch size 1024). Accuracy is measured by using micro F1 score.
  • Figure 3: Per-batch training time in seconds (left) and prediction accuracy (right). For timing, GraphSAGE refers to GraphSAGE-GCN in DBLP:journals/corr/HamiltonYL17. The timings of using other aggregators, such as GraphSAGE-mean, are similar. GCN refers to using batched learning, as opposed to the original version that is nonbatched; for more details of the implementation, see the appendix. The nonbatched version of GCN runs out of memory on the large graph Reddit. The sample sizes for FastGCN are 400, 100, and 400, respectively for the three data sets.
  • Figure 4: Training/test accuracy versus training time. From left to right, the data sets are Cora, Pubmed, and Reddit, respectively.

Theorems & Definitions (10)

  • Theorem 1
  • Proposition 2
  • Theorem 3
  • Proposition 4
  • proof : Proof of Theorem \ref{['thm:L']}
  • proof : Proof of Proposition \ref{['thm:E.Var']}
  • proof : Proof of Theorem \ref{['thm:E.Var2']}
  • proof : Proof of Proposition \ref{['thm:E.Var3']}
  • Theorem 5
  • proof