Table of Contents
Fetching ...

Semi-Supervised Learning on Graphs using Graph Neural Networks

Juntong Chen, Claire Donnat, Olga Klopp, Johannes Schmidt-Hieber

TL;DR

A sharp non-asymptotic risk bound is proved that separates approximation, stochastic, and optimization errors over GNNs with linear graph convolutions and a deep ReLU readout, yielding convergence rates that recover classical nonparametric behavior under full supervision.

Abstract

Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations.

Semi-Supervised Learning on Graphs using Graph Neural Networks

TL;DR

A sharp non-asymptotic risk bound is proved that separates approximation, stochastic, and optimization errors over GNNs with linear graph convolutions and a deep ReLU readout, yielding convergence rates that recover classical nonparametric behavior under full supervision.

Abstract

Graph neural networks (GNNs) work remarkably well in semi-supervised node regression, yet a rigorous theory explaining when and why they succeed remains lacking. To address this gap, we study an aggregate-and-readout model that encompasses several common message passing architectures: node features are first propagated over the graph then mapped to responses via a nonlinear function. For least-squares estimation over GNNs with linear graph convolutions and a deep ReLU readout, we prove a sharp non-asymptotic risk bound that separates approximation, stochastic, and optimization errors. The bound makes explicit how performance scales with the fraction of labeled nodes and graph-induced dependence. Approximation guarantees are further derived for graph-smoothing followed by smooth nonlinear readouts, yielding convergence rates that recover classical nonparametric behavior under full supervision while characterizing performance when labels are scarce. Numerical experiments validate our theory, providing a systematic framework for understanding GNN performance and limitations.
Paper Structure (24 sections, 16 theorems, 208 equations, 7 figures, 1 table)

This paper contains 24 sections, 16 theorems, 208 equations, 7 figures, 1 table.

Key Result

Theorem 1

Assume Assumption ass-m holds with $m\geq1$. For $0<\delta \leq 1$, let $\mathcal{F}_\delta$ be a $\delta$-cover of $\mathcal{F}$ whose entropy satisfies $\log\mathcal{N}_\delta \geq 1$. Suppose that there exists a constant $F\geq1$ such that $\|f_i\|_{\infty} \leq F$ for all $i\in[n]$ and all $f \i where with $C_1,C_2>0$ universal constants.

Figures (7)

  • Figure 1: Graph feature propagation followed by a nonlinear readout: a linear message-passing block generates propagated features, which are then mapped to node-level predictions via a ReLU DNN.
  • Figure 2: MSE (over 20 trials) as a function of training samples $n$, with the unmasked proportion held constant at $\pi \in \{0.35, 0.75, 0.85, 0.95\}$. Estimators are distinguished by color: GCN with skip connections, GCN without skip connections, and the MLP baseline.
  • Figure 3: Left: Fitted slopes of $\log(\mathrm{MSE})$ vs. $\log(n)$ as a function of connection probability $\pi$. Right: $\log(\operatorname{MSE})$ as a function of the effective sample size $n_{\text{eff}} = n \times \pi$ for the GCN with skip connections, as proposed in Equation \ref{['gcn-l-t']}.
  • Figure 4: MSE (over 20 trials) as a function of $\log(1/\pi)$ across different graph sizes. Estimators are distinguished by color: GCN with skip connections, GCN without skip connections, and the MLP baseline.
  • Figure 5: Performance of the GCN (with skip connection) as a function of the maximum degree of the graph ($x$-axis), for different convolution types (columns) and values of $\bar{\delta}$ (colors) on a Barabási–Albert graph.
  • ...and 2 more figures

Theorems & Definitions (29)

  • Theorem 1
  • Proposition 1
  • Corollary 2
  • Lemma 3
  • Lemma 4
  • Lemma 5
  • Theorem 2
  • Lemma 6: Bernstein's inequality, see e.g. Corollary 2.11 of boucheron2013concentration
  • Theorem 7
  • Lemma 8
  • ...and 19 more