Table of Contents
Fetching ...

What Expressivity Theory Misses: Message Passing Complexity for GNNs

Niklas Kemper, Tom Wollschläger, Stephan Günnemann

TL;DR

The paper argues that expressivity theories based on the WL test are insufficient to explain real-world GNN performance and introduces Message Passing Complexity (MPC), a continuous, task-specific measure derived from a probabilistic variant of WL (lossyWL). MPC quantifies how difficult it is for a given architecture to solve a graph task via message passing, incorporating practical constraints like over-squashing and under-reaching while preserving WL-impossibility results. The authors prove theoretical properties linking MPC to expressivity, refinement, and compositionality, and validate MPC across tasks (retaining information, propagating information, ring detection) on synthetic and real graphs, showing strong alignment with empirical performance. They show that performance improvements often stem from architectural biases that lower task-specific MPC rather than from universal increases in expressivity, suggesting a shift in design focus toward minimizing MPC for domain-specific tasks. These results offer a principled framework to diagnose and guide GNN architecture design in practical settings.

Abstract

Expressivity theory, characterizing which graphs a GNN can distinguish, has become the predominant framework for analyzing GNNs, with new models striving for higher expressivity. However, we argue that this focus is misguided: First, higher expressivity is not necessary for most real-world tasks as these tasks rarely require expressivity beyond the basic WL test. Second, expressivity theory's binary characterization and idealized assumptions fail to reflect GNNs' practical capabilities. To overcome these limitations, we propose Message Passing Complexity (MPC): a continuous measure that quantifies the difficulty for a GNN architecture to solve a given task through message passing. MPC captures practical limitations like over-squashing while preserving the theoretical impossibility results from expressivity theory, effectively narrowing the gap between theory and practice. Through extensive validation on fundamental GNN tasks, we show that MPC's theoretical predictions correlate with empirical performance, successfully explaining architectural successes and failures. Thereby, MPC advances beyond expressivity theory to provide a more powerful and nuanced framework for understanding and improving GNN architectures.

What Expressivity Theory Misses: Message Passing Complexity for GNNs

TL;DR

The paper argues that expressivity theories based on the WL test are insufficient to explain real-world GNN performance and introduces Message Passing Complexity (MPC), a continuous, task-specific measure derived from a probabilistic variant of WL (lossyWL). MPC quantifies how difficult it is for a given architecture to solve a graph task via message passing, incorporating practical constraints like over-squashing and under-reaching while preserving WL-impossibility results. The authors prove theoretical properties linking MPC to expressivity, refinement, and compositionality, and validate MPC across tasks (retaining information, propagating information, ring detection) on synthetic and real graphs, showing strong alignment with empirical performance. They show that performance improvements often stem from architectural biases that lower task-specific MPC rather than from universal increases in expressivity, suggesting a shift in design focus toward minimizing MPC for domain-specific tasks. These results offer a principled framework to diagnose and guide GNN architecture design in practical settings.

Abstract

Expressivity theory, characterizing which graphs a GNN can distinguish, has become the predominant framework for analyzing GNNs, with new models striving for higher expressivity. However, we argue that this focus is misguided: First, higher expressivity is not necessary for most real-world tasks as these tasks rarely require expressivity beyond the basic WL test. Second, expressivity theory's binary characterization and idealized assumptions fail to reflect GNNs' practical capabilities. To overcome these limitations, we propose Message Passing Complexity (MPC): a continuous measure that quantifies the difficulty for a GNN architecture to solve a given task through message passing. MPC captures practical limitations like over-squashing while preserving the theoretical impossibility results from expressivity theory, effectively narrowing the gap between theory and practice. Through extensive validation on fundamental GNN tasks, we show that MPC's theoretical predictions correlate with empirical performance, successfully explaining architectural successes and failures. Thereby, MPC advances beyond expressivity theory to provide a more powerful and nuanced framework for understanding and improving GNN architectures.

Paper Structure

This paper contains 45 sections, 24 theorems, 75 equations, 21 figures, 2 tables, 1 algorithm.

Key Result

Theorem 4.6

The complexity for ${G},v \in {{\mathcal{G}}_\mathcal{V}}$ and function $f$ is ${\textup{MPC}}_{\mathcal{M}}(f_v, {G}) = \infty$ if and only if there exist ${G}',w \in {{\mathcal{G}}_\mathcal{V}}$ such that $f_v({G}) \neq f_w({G}')$ but ${M}_v({G}) = {M}_w({G}')$ for all model instantiations ${M} \i

Figures (21)

  • Figure 1: Limitations of iso expressivity vs. benefits of MPC. Top: Iso expressivity gives an idealized, binary view that misses GNNs’ practical capabilities. Bottom: Limited expressivity rarely restricts real-world performance as it focuses on worst-case graphs and tasks.
  • Figure 2: Update step of lossyWL for node $v$. Every message $m^l_{a\to v}$ (green) survives independently with probability ${{\bm{I}}}_{va}$. $\textup{lossyWL}{}$ models the lossy message propagation observed in real-world MPNNs.
  • Figure 3: Test accuracy for retaining initial node features compared with complexity measures MPC and WLC. Simulated MPC (in contrast to WL-based WLC) shows perfect negative Spearman correlation ($\rho_s = -1$) with accuracy, capturing increasing difficulty with depth (over-smoothing). Complete results in \ref{['fig:keep_heatmap_full']}.
  • Figure 4: Simulated MPC complexities for propagating features from source nodes $u$ (colored by MPC) to target node $v$ (square). Despite identical iso expressivity, MPC reveals the significant advantage virtual nodes offer for long-range dependencies.
  • Figure 5: Test accuracy vs. training data size for the propagation task $f_v({G}) = {\bm{X}}_u$ for different distances $D$. Colors indicate average simulated MPC for each distance. Higher MPC values reflect greater task difficulty, evidenced by increased sample complexity. All results in \ref{['fig:transfer_full']}; for real-world graphs from lrgb in \ref{['fig:peptides_line']}.
  • ...and 16 more figures

Theorems & Definitions (52)

  • Definition 3.1: Iso Expressivity
  • Definition 4.1
  • Definition 4.2
  • Definition 4.3: lossyWL
  • Definition 4.4
  • Definition 4.5: MPC
  • Theorem 4.6: Infeasibility
  • Lemma 4.6
  • Theorem 4.7: Function refinement
  • Lemma 4.7: Task Triangle Inequality
  • ...and 42 more