Table of Contents
Fetching ...

Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective

Kaicheng Zhang, Piero Deidda, Desmond Higham, Francesco Tudisco

TL;DR

This work tackles oversmoothing in graph neural networks by challenging the primacy of energy-based metrics such as Dirichlet energy, which can be misleading in realistic nonlinear settings. It proposes rank-based measures, notably numerical rank and effective rank, as robust indicators of oversmoothing, and proves theoretically that the numerical rank converges to 1 for broad classes of GNNs under nonnegative weights, extending beyond linear models to nonlinear activations via nonlinear Perron–Frobenius theory. The authors provide extensive experiments across diverse GNN architectures and datasets, demonstrating that rank relaxations track performance degradation more reliably than energy-based metrics, which may remain flat even as accuracy drops. This rank-centric perspective offers a scale-invariant, eigenspace-agnostic tool for diagnosing and mitigating oversmoothing in practical deep GNNs, with potential implications for architecture design and regularization strategies.

Abstract

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing in the nonlinear setting without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.

Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective

TL;DR

This work tackles oversmoothing in graph neural networks by challenging the primacy of energy-based metrics such as Dirichlet energy, which can be misleading in realistic nonlinear settings. It proposes rank-based measures, notably numerical rank and effective rank, as robust indicators of oversmoothing, and proves theoretically that the numerical rank converges to 1 for broad classes of GNNs under nonnegative weights, extending beyond linear models to nonlinear activations via nonlinear Perron–Frobenius theory. The authors provide extensive experiments across diverse GNN architectures and datasets, demonstrating that rank relaxations track performance degradation more reliably than energy-based metrics, which may remain flat even as accuracy drops. This rank-centric perspective offers a scale-invariant, eigenspace-agnostic tool for diagnosing and mitigating oversmoothing in practical deep GNNs, with potential implications for architecture design and regularization strategies.

Abstract

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing in the nonlinear setting without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.

Paper Structure

This paper contains 27 sections, 6 theorems, 54 equations, 2 figures, 7 tables.

Key Result

Theorem 3.2

Let $X^{(l+1)}=\sigma(A^{(l)} X^{(l)} W^{(l)})$, $l=1,\dots,L$, be a GNN such that $u$ is the dominant eigenvector of $A_l$ for any $l$ and is also an eigenvector of the activation function $\sigma$. If $\sigma$ is $1$-Lipschitz, namely $\|\sigma(x)-\sigma(y)\|\leq \|x-y\|$ for any $x,y$, and $\lim_

Figures (2)

  • Figure 1: Toy scenarios depicting the behaviour of oversmoothing metrics. Each plot contains 50 nodes, each with two features plotted on the x-y axis. The features are: # 1 of the same value; # 2 perfectly aligned with the same vector; # 3 aligned to the same vector except for one (red) point; # 4 sampled from a uniform distribution. MAD (Sec. \ref{['sec:experiments']}) and $E_{\mathrm{Dir}}$ give false negative signals in # 3 although features are oversmoothing by definition. $E_{\mathrm{Proj}}$ can hardly differentiate between # 3 and # 4, and is thus not robust in quantifying oversmoothing. To compute $E_{\mathrm{Proj}}$ and $E_{\mathrm{Dir}}$, the first feature was considered in place of $u$ in \ref{['eq:DirE']} and \ref{['eq:demystify_mu']}.
  • Figure 2: Four examples of the metric behaviours computed at the last hidden layer of separately trained GCNs of increasing depths. For Erank and Numrank, we measure Erank$(X)-r^*_\mathrm{ER}$ and $\mathrm{NumRank}(X)-r^*_\mathrm{NR}$ for some $r^*>1$. In these particular cases, $r^*_\mathrm{ER}<1.85$, $r^*_\mathrm{NR}<1.3$. Note that the effective rank and numerical rank of the input features $X^{(0)}$ are about 1084 and 13.6, respectively. Additional results are attached in \ref{['apd:additional_empirical_results']}.

Theorems & Definitions (8)

  • Definition 3.1
  • Theorem 3.2
  • Theorem 4.1
  • Theorem 5.1
  • Definition 5.2
  • Lemma 5.3
  • Theorem 5.4
  • Proposition 5.5