Table of Contents
Fetching ...

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

Andreas Roth

TL;DR

This work investigates why graph neural networks struggle to scale with depth, identifying two core phenomena—Shared Component Amplification (SCA) and Component Dominance (CD)—that drive rank collapse and over-smoothing. By reframing graph convolutions in spectral terms and mapping MP updates to power iterations, the authors articulate how a single computational graph inherently amplifies the same spectral component across all feature channels, limiting expressivity. To counteract this, they introduce the Multi-Relational Split (MRS) framework and the MIMO Graph Convolution (MIMO-GC), plus a localized LMGC variant, enabling multiple spectral components to be amplified across distinct relations or edges, thereby avoiding SCA and improving injectivity and expressivity. They further connect CD to PageRank and propose a Personalized PageRank GNN (PPRGNN) to permit infinite-depth propagation without losing initial information. Complementary results show that a Sum of Kronecker Products (SKP) framework can robustly avoid SCA and facilitate optimization, with empirical validation across standard graph datasets. Collectively, the work provides a cohesive theoretical foundation for understanding MP dynamics and delivers principled architectures to mitigate rank collapse and over-smoothing in graph neural networks, with practical implications for more scalable, expressive GNNs.

Abstract

While message-passing neural networks (MPNNs) have shown promising results, their real-world impact remains limited. Although various limitations have been identified, their theoretical foundations remain poorly understood, leading to fragmented research efforts. In this thesis, we provide an in-depth theoretical analysis and identify several key properties limiting the performance of MPNNs. Building on these findings, we propose several frameworks that address these shortcomings. We identify two properties exhibited by many MPNNs: shared component amplification (SCA), where each message-passing iteration amplifies the same components across all feature channels, and component dominance (CD), where a single component gets increasingly amplified as more message-passing steps are applied. These properties lead to the observable phenomenon of rank collapse of node representations, which generalizes the established over-smoothing phenomenon. By generalizing and decomposing over-smoothing, we enable a deeper understanding of MPNNs, more targeted solutions, and more precise communication within the field. To avoid SCA, we show that utilizing multiple computational graphs or edge relations is necessary. Our multi-relational split (MRS) framework transforms any existing MPNN into one that leverages multiple edge relations. Additionally, we introduce the spectral graph convolution for multiple feature channels (MIMO-GC), which naturally uses multiple computational graphs. A localized variant, LMGC, approximates the MIMO-GC while inheriting its beneficial properties. To address CD, we demonstrate a close connection between MPNNs and the PageRank algorithm. Based on personalized PageRank, we propose a variant of MPNNs that allows for infinitely many message-passing iterations, while preserving initial node features. Collectively, these results deepen the theoretical understanding of MPNNs.

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

TL;DR

This work investigates why graph neural networks struggle to scale with depth, identifying two core phenomena—Shared Component Amplification (SCA) and Component Dominance (CD)—that drive rank collapse and over-smoothing. By reframing graph convolutions in spectral terms and mapping MP updates to power iterations, the authors articulate how a single computational graph inherently amplifies the same spectral component across all feature channels, limiting expressivity. To counteract this, they introduce the Multi-Relational Split (MRS) framework and the MIMO Graph Convolution (MIMO-GC), plus a localized LMGC variant, enabling multiple spectral components to be amplified across distinct relations or edges, thereby avoiding SCA and improving injectivity and expressivity. They further connect CD to PageRank and propose a Personalized PageRank GNN (PPRGNN) to permit infinite-depth propagation without losing initial information. Complementary results show that a Sum of Kronecker Products (SKP) framework can robustly avoid SCA and facilitate optimization, with empirical validation across standard graph datasets. Collectively, the work provides a cohesive theoretical foundation for understanding MP dynamics and delivers principled architectures to mitigate rank collapse and over-smoothing in graph neural networks, with practical implications for more scalable, expressive GNNs.

Abstract

While message-passing neural networks (MPNNs) have shown promising results, their real-world impact remains limited. Although various limitations have been identified, their theoretical foundations remain poorly understood, leading to fragmented research efforts. In this thesis, we provide an in-depth theoretical analysis and identify several key properties limiting the performance of MPNNs. Building on these findings, we propose several frameworks that address these shortcomings. We identify two properties exhibited by many MPNNs: shared component amplification (SCA), where each message-passing iteration amplifies the same components across all feature channels, and component dominance (CD), where a single component gets increasingly amplified as more message-passing steps are applied. These properties lead to the observable phenomenon of rank collapse of node representations, which generalizes the established over-smoothing phenomenon. By generalizing and decomposing over-smoothing, we enable a deeper understanding of MPNNs, more targeted solutions, and more precise communication within the field. To avoid SCA, we show that utilizing multiple computational graphs or edge relations is necessary. Our multi-relational split (MRS) framework transforms any existing MPNN into one that leverages multiple edge relations. Additionally, we introduce the spectral graph convolution for multiple feature channels (MIMO-GC), which naturally uses multiple computational graphs. A localized variant, LMGC, approximates the MIMO-GC while inheriting its beneficial properties. To address CD, we demonstrate a close connection between MPNNs and the PageRank algorithm. Based on personalized PageRank, we propose a variant of MPNNs that allows for infinitely many message-passing iterations, while preserving initial node features. Collectively, these results deepen the theoretical understanding of MPNNs.
Paper Structure (136 sections, 36 theorems, 271 equations, 27 figures, 13 tables)

This paper contains 136 sections, 36 theorems, 271 equations, 27 figures, 13 tables.

Key Result

theorem 1

Let $\mA\in\mathbb{R}^{n\times n}$ be the adjacency matrix of an undirected and non-bipartite graph. Then, for any $\vx\in\mathbb{R}^n$, where $\mathbf{1}\in\mathbb{R}^{n}$ and $c,d\in\mathbb{R}$.

Figures (27)

  • Figure 1: Schematic overview of this thesis and the connections between our contributions.
  • Figure 2: Training and testing accuracy achieved on the Cora sen2008collective dataset for training GCN models with different numbers of iterations. This experiment is based on kipf2017semi.
  • Figure 3: Two-dimensional node features $\mX^{(k)}\in\mathbb{R}^{n\times d}$ after applying up to three GCN iterations $\mX^{(k)} = \mA_{\text{sym}}\mX^{(k-1)}\mW^{(k)}$ where $\mA_\text{sym}\in\mathbb{R}^{n\times n}$ is given by the Cora dataset sen2008collective. $\mX^{(0)}$ is given by a linear transformation of the provided node features for Cora. All parameters are randomly initialized. Each dot represents the state of a single node. This experiment is based on li2018deeper.
  • Figure 4: Dirichlet energy $E\left(\mX^{(k)}\right) = \mathop{\mathrm{tr}}\nolimits\left(\left(\mX^{(k)}\right)^\top\mL_\text{sym}\mX^{(k)}\right)$ for $\mX^{(k)} = \phi\left(\mA_\text{sym}\mX^{(k-1)}\mW^{(k)}\right)$ where $\phi$ is the ReLU activation function, each $\mW^{(k)}$ is randomly initialized and $k$ indicates the iteration number. The Cora dataset was used for initial features and as graph structure sen2008collective. Average values over $20$ random initializations are shown. This experiment is based on oono2020graphcai2020anote.
  • Figure 5: Dirichlet energy $E\left(\mX^{(k)}\right) = \mathop{\mathrm{tr}}\nolimits\left(\left(\mX^{(k)}\right)^\top\mL\mX^{(k)}\right)$ using the unnormalized graph Laplacian $\mL$ for the node representations obtained by three message-passing methods on the Cora dataset sen2008collective. As the kernel of $\mL$ is spanned by constant-valued vectors, this version of the Dirichlet energy is zero when $\mX^{(k)}$ only has non-zero components belonging to constant-valued vectors. This experiment is based on rusch2022graph.
  • ...and 22 more figures

Theorems & Definitions (84)

  • definition 1: Permutation Invariance
  • definition 2: Graph Classifier
  • definition 3: Permutation Equivariance
  • theorem 1: Theorem 1 from li2018deeper
  • proof
  • proposition 1: Proposition 3.4 from cai2020anote
  • proof
  • proposition 2: Theorem 5.3 from giovanni2023understanding
  • proof
  • proposition 3: Vanishing norm of node representations
  • ...and 74 more