Table of Contents
Fetching ...

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

Qimai Li, Zhichao Han, Xiao-Ming Wu

TL;DR

This work demystifies Graph Convolutional Networks for semi-supervised learning by showing that GCNs perform Laplacian smoothing, which explains their strong performance but can lead to over-smoothing and poor label propagation in deep stacks. To address these limits, the authors propose co-training with a random-walk model and self-training to expand the labeled set without extra validation data, plus Union/Intersection strategies to combine signals. Experiments on CiteSeer, Cora, and PubMed demonstrate substantial gains at very low labeling rates, often surpassing standard GCNs and several strong baselines. The study provides both theoretical insights into GCN behavior and practical training methods that enable effective graph-based learning with limited labeled data.

Abstract

Many interesting problems in machine learning are being revisited with new deep learning tools. For graph-based semisupervised learning, a recent important development is graph convolutional networks (GCNs), which nicely integrate local vertex features and graph topology in the convolutional layers. Although the GCN model compares favorably with other state-of-the-art methods, its mechanisms are not clear and it still requires a considerable amount of labeled data for validation and model selection. In this paper, we develop deeper insights into the GCN model and address its fundamental limits. First, we show that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers. Second, to overcome the limits of the GCN model with shallow architectures, we propose both co-training and self-training approaches to train GCNs. Our approaches significantly improve GCNs in learning with very few labels, and exempt them from requiring additional labels for validation. Extensive experiments on benchmarks have verified our theory and proposals.

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

TL;DR

This work demystifies Graph Convolutional Networks for semi-supervised learning by showing that GCNs perform Laplacian smoothing, which explains their strong performance but can lead to over-smoothing and poor label propagation in deep stacks. To address these limits, the authors propose co-training with a random-walk model and self-training to expand the labeled set without extra validation data, plus Union/Intersection strategies to combine signals. Experiments on CiteSeer, Cora, and PubMed demonstrate substantial gains at very low labeling rates, often surpassing standard GCNs and several strong baselines. The study provides both theoretical insights into GCN behavior and practical training methods that enable effective graph-based learning with limited labeled data.

Abstract

Many interesting problems in machine learning are being revisited with new deep learning tools. For graph-based semisupervised learning, a recent important development is graph convolutional networks (GCNs), which nicely integrate local vertex features and graph topology in the convolutional layers. Although the GCN model compares favorably with other state-of-the-art methods, its mechanisms are not clear and it still requires a considerable amount of labeled data for validation and model selection. In this paper, we develop deeper insights into the GCN model and address its fundamental limits. First, we show that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers. Second, to overcome the limits of the GCN model with shallow architectures, we propose both co-training and self-training approaches to train GCNs. Our approaches significantly improve GCNs in learning with very few labels, and exempt them from requiring additional labels for validation. Extensive experiments on benchmarks have verified our theory and proposals.

Paper Structure

This paper contains 15 sections, 1 theorem, 12 equations, 2 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

If a graph has no bipartite components, then for any $\mathbf{w}\in \mathbb{R}^n$, and $\alpha \in (0,1]$, where $\theta_1\in \mathbb{R}^k, \theta_2\in \mathbb{R}^k$, i.e., they converge to a linear combination of $\{\mathbf{1}^{(i)}\}_{i=1}^{k}$ and $\{D^{-\frac{1}{2}}\mathbf{1}^{(i)}\}_{i=1}^{k}$ respectively.

Figures (2)

  • Figure 1: Performance comparison of GCNs, label propagation, and our method for semi-supervised classification on the Cora citation network.
  • Figure 2: Vertex embeddings of Zachary's karate club network with GCNs with 1,2,3,4,5 layers.

Theorems & Definitions (2)

  • Theorem 1
  • proof