Towards Understanding and Avoiding Limitations of Convolutions on Graphs

Andreas Roth

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

Andreas Roth

TL;DR

This work investigates why graph neural networks struggle to scale with depth, identifying two core phenomena—Shared Component Amplification (SCA) and Component Dominance (CD)—that drive rank collapse and over-smoothing. By reframing graph convolutions in spectral terms and mapping MP updates to power iterations, the authors articulate how a single computational graph inherently amplifies the same spectral component across all feature channels, limiting expressivity. To counteract this, they introduce the Multi-Relational Split (MRS) framework and the MIMO Graph Convolution (MIMO-GC), plus a localized LMGC variant, enabling multiple spectral components to be amplified across distinct relations or edges, thereby avoiding SCA and improving injectivity and expressivity. They further connect CD to PageRank and propose a Personalized PageRank GNN (PPRGNN) to permit infinite-depth propagation without losing initial information. Complementary results show that a Sum of Kronecker Products (SKP) framework can robustly avoid SCA and facilitate optimization, with empirical validation across standard graph datasets. Collectively, the work provides a cohesive theoretical foundation for understanding MP dynamics and delivers principled architectures to mitigate rank collapse and over-smoothing in graph neural networks, with practical implications for more scalable, expressive GNNs.

Abstract

While message-passing neural networks (MPNNs) have shown promising results, their real-world impact remains limited. Although various limitations have been identified, their theoretical foundations remain poorly understood, leading to fragmented research efforts. In this thesis, we provide an in-depth theoretical analysis and identify several key properties limiting the performance of MPNNs. Building on these findings, we propose several frameworks that address these shortcomings. We identify two properties exhibited by many MPNNs: shared component amplification (SCA), where each message-passing iteration amplifies the same components across all feature channels, and component dominance (CD), where a single component gets increasingly amplified as more message-passing steps are applied. These properties lead to the observable phenomenon of rank collapse of node representations, which generalizes the established over-smoothing phenomenon. By generalizing and decomposing over-smoothing, we enable a deeper understanding of MPNNs, more targeted solutions, and more precise communication within the field. To avoid SCA, we show that utilizing multiple computational graphs or edge relations is necessary. Our multi-relational split (MRS) framework transforms any existing MPNN into one that leverages multiple edge relations. Additionally, we introduce the spectral graph convolution for multiple feature channels (MIMO-GC), which naturally uses multiple computational graphs. A localized variant, LMGC, approximates the MIMO-GC while inheriting its beneficial properties. To address CD, we demonstrate a close connection between MPNNs and the PageRank algorithm. Based on personalized PageRank, we propose a variant of MPNNs that allows for infinitely many message-passing iterations, while preserving initial node features. Collectively, these results deepen the theoretical understanding of MPNNs.

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

TL;DR

Abstract

Paper Structure (136 sections, 36 theorems, 271 equations, 27 figures, 13 tables)

This paper contains 136 sections, 36 theorems, 271 equations, 27 figures, 13 tables.

Introduction
Motivation
Research Questions
Main Contributions
Outline and Covered Publications
Chapter 2: Fundamentals of Graph Machine Learning
Chapter 3: Extending Our Understanding of Graph Convolutions
Chapter 4: Preventing Shared Component Amplification With Multiple Computational Graphs
Chapter 5: Preventing Component Dominance based on Personalized PageRank
Chapter 6: Summary and Outlook
Additional Publications
Fundamentals of Graph Machine Learning
Notation
Graph Theory
Graph Isomorphism
...and 121 more sections

Key Result

theorem 1

Let $\mA\in\mathbb{R}^{n\times n}$ be the adjacency matrix of an undirected and non-bipartite graph. Then, for any $\vx\in\mathbb{R}^n$, where $\mathbf{1}\in\mathbb{R}^{n}$ and $c,d\in\mathbb{R}$.

Figures (27)

Figure 1: Schematic overview of this thesis and the connections between our contributions.
Figure 2: Training and testing accuracy achieved on the Cora sen2008collective dataset for training GCN models with different numbers of iterations. This experiment is based on kipf2017semi.
Figure 3: Two-dimensional node features $\mX^{(k)}\in\mathbb{R}^{n\times d}$ after applying up to three GCN iterations $\mX^{(k)} = \mA_{\text{sym}}\mX^{(k-1)}\mW^{(k)}$ where $\mA_\text{sym}\in\mathbb{R}^{n\times n}$ is given by the Cora dataset sen2008collective. $\mX^{(0)}$ is given by a linear transformation of the provided node features for Cora. All parameters are randomly initialized. Each dot represents the state of a single node. This experiment is based on li2018deeper.
Figure 4: Dirichlet energy $E\left(\mX^{(k)}\right) = \mathop{\mathrm{tr}}\nolimits\left(\left(\mX^{(k)}\right)^\top\mL_\text{sym}\mX^{(k)}\right)$ for $\mX^{(k)} = \phi\left(\mA_\text{sym}\mX^{(k-1)}\mW^{(k)}\right)$ where $\phi$ is the ReLU activation function, each $\mW^{(k)}$ is randomly initialized and $k$ indicates the iteration number. The Cora dataset was used for initial features and as graph structure sen2008collective. Average values over $20$ random initializations are shown. This experiment is based on oono2020graphcai2020anote.
Figure 5: Dirichlet energy $E\left(\mX^{(k)}\right) = \mathop{\mathrm{tr}}\nolimits\left(\left(\mX^{(k)}\right)^\top\mL\mX^{(k)}\right)$ using the unnormalized graph Laplacian $\mL$ for the node representations obtained by three message-passing methods on the Cora dataset sen2008collective. As the kernel of $\mL$ is spanned by constant-valued vectors, this version of the Dirichlet energy is zero when $\mX^{(k)}$ only has non-zero components belonging to constant-valued vectors. This experiment is based on rusch2022graph.
...and 22 more figures

Theorems & Definitions (84)

definition 1: Permutation Invariance
definition 2: Graph Classifier
definition 3: Permutation Equivariance
theorem 1: Theorem 1 from li2018deeper
proof
proposition 1: Proposition 3.4 from cai2020anote
proof
proposition 2: Theorem 5.3 from giovanni2023understanding
proof
proposition 3: Vanishing norm of node representations
...and 74 more

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

TL;DR

Abstract

Towards Understanding and Avoiding Limitations of Convolutions on Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (27)

Theorems & Definitions (84)