Table of Contents
Fetching ...

Understanding the Effect of GCN Convolutions in Regression Tasks

Juntong Chen, Johannes Schmidt-Hieber, Claire Donnat, Olga Klopp

TL;DR

This work analyzes the statistical behavior of graph convolutional networks (GCNs) in regression tasks under a fixed-design setting, focusing on how neighborhood depth $L$ and two convolution operators, the original GCN with $T$ and GraphSAGE with $S$, affect estimation error. It develops a bias-variance decomposition for linear GCNs and introduces a novel walk-based variance analysis, showing that variance can be expressed as a weighted sum over local walks and that topology can slow variance decay, contributing to over-smoothing in non-asymptotic regimes. Theoretical results identify an optimal depth $L$ that balances bias and variance and elucidate how different local topologies (e.g., rooted trees vs cycles) influence variance decay, all validated by synthetic experiments. Real-data experiments on six diverse graphs corroborate the theory and offer practical guidelines for selecting convolution type and depth to improve regression performance on graphs.

Abstract

Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes exhibit similar signals and provide statistical theory for the impact of convolution operators. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSAGE convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance type trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners.

Understanding the Effect of GCN Convolutions in Regression Tasks

TL;DR

This work analyzes the statistical behavior of graph convolutional networks (GCNs) in regression tasks under a fixed-design setting, focusing on how neighborhood depth and two convolution operators, the original GCN with and GraphSAGE with , affect estimation error. It develops a bias-variance decomposition for linear GCNs and introduces a novel walk-based variance analysis, showing that variance can be expressed as a weighted sum over local walks and that topology can slow variance decay, contributing to over-smoothing in non-asymptotic regimes. Theoretical results identify an optimal depth that balances bias and variance and elucidate how different local topologies (e.g., rooted trees vs cycles) influence variance decay, all validated by synthetic experiments. Real-data experiments on six diverse graphs corroborate the theory and offer practical guidelines for selecting convolution type and depth to improve regression performance on graphs.

Abstract

Graph Convolutional Networks (GCNs) have become a pivotal method in machine learning for modeling functions over graphs. Despite their widespread success across various applications, their statistical properties (e.g., consistency, convergence rates) remain ill-characterized. To begin addressing this knowledge gap, we consider networks for which the graph structure implies that neighboring nodes exhibit similar signals and provide statistical theory for the impact of convolution operators. Focusing on estimators based solely on neighborhood aggregation, we examine how two common convolutions - the original GCN and GraphSAGE convolutions - affect the learning error as a function of the neighborhood topology and the number of convolutional layers. We explicitly characterize the bias-variance type trade-off incurred by GCNs as a function of the neighborhood size and identify specific graph topologies where convolution operators are less effective. Our theoretical findings are corroborated by synthetic experiments, and provide a start to a deeper quantitative understanding of convolutional effects in GCNs for offering rigorous guidelines for practitioners.

Paper Structure

This paper contains 29 sections, 5 theorems, 82 equations, 20 figures.

Key Result

Theorem 1

If eq.fiw holds, then and if eq.fiw2 holds, then where

Figures (20)

  • Figure 1: Optimal number of convolutions as a function of the roughness $\|\Delta {\bf f^{*}}\|_2 = \sqrt{[\sum_{(i,j) \in \mathcal{E}} (f_i^{*} - f_j^{*})^2]/|\mathcal{E}|}$ of $\mathbf{f}^{*}$ on the latent variable graph with $\sigma^2=2$. Each subplot corresponds to a different sparsification level, with 0 representing the original graph and 0.75 representing a graph with 75% of its edges removed.
  • Figure 2: The variance decay at the root of the tree as a function of $L$ under the signal $f_i^* = 2 \cos(U_{i\cdot} \beta)$, with $\beta = (-1,1)^{\top}$, colored by degree value. The $y$-axis is shared across all 3 plots.
  • Figure 3: The variance decay at the root of the tree as a function of $L$ under the signal $f_i^* = 2 \cos(U_{i\cdot} \beta)$, with $\beta = (-1,1)^{\top}$, and its behavior after adding cycles.
  • Figure 4: Mean Squared Error (MSE) as a function of the neighborhood size. The nodes denote the mean MSE over 50 random splits of the data into training and validation sets, and the error bars denote interquartile ranges.
  • Figure 5: Properties of the datasets used in the real-data experiment section. The last 3 columns measure the signal roughness, defined either in terms of the $\ell_2$ norm ($\| \Delta {\bf Y}\|^2_2/|\mathcal{E}| = [\sum_{(i,j) \in \mathcal{E}} (Y_i-Y_j)^2]/|\mathcal{E}|$) or the $\ell_{\infty}$ norm ($\| \Delta {\bf Y}\|_{\max} = \max_{(i,j) \in \mathcal{E}} |Y_i-Y_j|$). The last column represent the graph smoothness over the neighborhood (after one convolution $S$).
  • ...and 15 more figures

Theorems & Definitions (8)

  • Theorem 1
  • Theorem 2
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • proof
  • proof
  • proof