Table of Contents
Fetching ...

A Survey on Oversmoothing in Graph Neural Networks

T. Konstantin Rusch, Michael M. Bronstein, Siddhartha Mishra

TL;DR

Problem: deep GNNs suffer from over-smoothing, where node features converge to a common value as depth increases. Approach: the authors axiomatize over-smoothing using a node-similarity measure mu with exponential convergence, review measures like Dirichlet energy and MAD, and evaluate mitigation strategies across multiple graph sizes, including continuous-time GNNs. Findings: Dirichlet energy provides a robust, stable measure; MAD is not always a valid node-similarity measure; many mitigation strategies slow or halt convergence, but preserving expressivity remains essential, with G^2-GCN standing out. Significance: the work clarifies measurement definitions, guides method selection for deep GNNs, and extends the concept to continuous-time models, informing both theory and practice.

Abstract

Node features of graph neural networks (GNNs) tend to become more similar with the increase of the network depth. This effect is known as over-smoothing, which we axiomatically define as the exponential convergence of suitable similarity measures on the node features. Our definition unifies previous approaches and gives rise to new quantitative measures of over-smoothing. Moreover, we empirically demonstrate this behavior for several over-smoothing measures on different graphs (small-, medium-, and large-scale). We also review several approaches for mitigating over-smoothing and empirically test their effectiveness on real-world graph datasets. Through illustrative examples, we demonstrate that mitigating over-smoothing is a necessary but not sufficient condition for building deep GNNs that are expressive on a wide range of graph learning tasks. Finally, we extend our definition of over-smoothing to the rapidly emerging field of continuous-time GNNs.

A Survey on Oversmoothing in Graph Neural Networks

TL;DR

Problem: deep GNNs suffer from over-smoothing, where node features converge to a common value as depth increases. Approach: the authors axiomatize over-smoothing using a node-similarity measure mu with exponential convergence, review measures like Dirichlet energy and MAD, and evaluate mitigation strategies across multiple graph sizes, including continuous-time GNNs. Findings: Dirichlet energy provides a robust, stable measure; MAD is not always a valid node-similarity measure; many mitigation strategies slow or halt convergence, but preserving expressivity remains essential, with G^2-GCN standing out. Significance: the work clarifies measurement definitions, guides method selection for deep GNNs, and extends the concept to continuous-time models, informing both theory and practice.

Abstract

Node features of graph neural networks (GNNs) tend to become more similar with the increase of the network depth. This effect is known as over-smoothing, which we axiomatically define as the exponential convergence of suitable similarity measures on the node features. Our definition unifies previous approaches and gives rise to new quantitative measures of over-smoothing. Moreover, we empirically demonstrate this behavior for several over-smoothing measures on different graphs (small-, medium-, and large-scale). We also review several approaches for mitigating over-smoothing and empirically test their effectiveness on real-world graph datasets. Through illustrative examples, we demonstrate that mitigating over-smoothing is a necessary but not sufficient condition for building deep GNNs that are expressive on a wide range of graph learning tasks. Finally, we extend our definition of over-smoothing to the rapidly emerging field of continuous-time GNNs.
Paper Structure (13 sections, 11 equations, 3 figures)

This paper contains 13 sections, 11 equations, 3 figures.

Figures (3)

  • Figure 1: Dirichlet energy and Mean Average Distance (MAD) of layer-wise node features ${\bf X}^n$ propagated through a GAT, GCN and GraphSAGE for three different graph datasets, (left) small-scale Texas graph, (middle) medium-scale Cora citation network, (right) large-scale Facebook network (Cornell5).
  • Figure 2: Layer-wise Dirichlet energy of hidden node features propagated through G$^2$-GCN, GraphCON-GCN, PairNorm, GCNII, DropEdge-GCN and Res-GCN on three different graphs, i.e., (left) small-scale Texas graph, (middle) medium-scale Cora citation network, (right) large-scale Facebook (Cornell5) network.
  • Figure 3: Trained G$^2$-GCN, PairNorm, GCN with bias and GCN without bias on the fully-supervised Cora graph dataset using the pre-defined $10$ splits from geom_gcn, showing two different measures for increasing number of layers ranging from $1$ to $128$: (left) Dirichlet energy of the layer-wise node features, (right) test accuracies.

Theorems & Definitions (2)

  • Remark 2.1
  • Remark 2.2