Table of Contents
Fetching ...

Stochastic Training of Graph Convolutional Networks with Variance Reduction

Jianfei Chen, Jun Zhu, Le Song

TL;DR

The paper tackles the challenge of training GCNs efficiently without expanding the neighbor search to impractical sizes. It introduces a variance-reduction framework based on control variates, coupled with a preprocessing step that reduces the network depth, enabling training with tiny neighbor sets while preserving convergence to a local optimum. The authors prove unbiased gradients and convergence under zero dropout, and demonstrate that CV with preprocessing matches or closely Approximates exact GCN performance with large speedups on benchmarks like Reddit. Overall, the approach provides a scalable path to high-performance GCN training on large graphs without sacrificing accuracy.

Abstract

Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop control variate based algorithms which allow sampling an arbitrarily small neighbor size. Furthermore, we prove new theoretical guarantee for our algorithms to converge to a local optimum of GCN. Empirical results show that our algorithms enjoy a similar convergence with the exact algorithm using only two neighbors per node. The runtime of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms.

Stochastic Training of Graph Convolutional Networks with Variance Reduction

TL;DR

The paper tackles the challenge of training GCNs efficiently without expanding the neighbor search to impractical sizes. It introduces a variance-reduction framework based on control variates, coupled with a preprocessing step that reduces the network depth, enabling training with tiny neighbor sets while preserving convergence to a local optimum. The authors prove unbiased gradients and convergence under zero dropout, and demonstrate that CV with preprocessing matches or closely Approximates exact GCN performance with large speedups on benchmarks like Reddit. Overall, the approach provides a scalable path to high-performance GCN training on large graphs without sacrificing accuracy.

Abstract

Graph convolutional networks (GCNs) are powerful deep neural networks for graph-structured data. However, GCN computes the representation of a node recursively from its neighbors, making the receptive field size grow exponentially with the number of layers. Previous attempts on reducing the receptive field size by subsampling neighbors do not have a convergence guarantee, and their receptive field size per node is still in the order of hundreds. In this paper, we develop control variate based algorithms which allow sampling an arbitrarily small neighbor size. Furthermore, we prove new theoretical guarantee for our algorithms to converge to a local optimum of GCN. Empirical results show that our algorithms enjoy a similar convergence with the exact algorithm using only two neighbors per node. The runtime of our algorithms on a large Reddit dataset is only one seventh of previous neighbor sampling algorithms.

Paper Structure

This paper contains 5 sections, 2 theorems, 12 equations, 12 figures, 3 tables.

Key Result

Theorem 1

For a fixed $W$ and any $i>LI$ we have:

Figures (12)

  • Figure 1: Comparison of training loss with respect to number of epochs without dropout. The CV+PP curve overlaps with the Exact curve in the first four datasets.
  • Figure 2: Comparison of validation accuracy with respect to number of epochs. NS converges to 0.94 on the Reddit dataset and 0.6 on the PPI dataset.
  • Figure 3: Comparison of the accuracy of different testing algorithms. The y-axis is Micro-F1 for PPI and accuracy otherwise.
  • Figure 4: Bias and standard deviation of the gradient for different algorithms during training.
  • Figure :
  • ...and 7 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2