Table of Contents
Fetching ...

Almost Surely Asymptotically Constant Graph Neural Networks

Sam Adam-Day, Michael Benedikt, İsmail İlkan Ceylan, Ben Finkelshtein

TL;DR

This work presents a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as the authors apply them on larger graphs drawn from some random graph model, and shows that the output converges to a constant function.

Abstract

We present a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as we apply them on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can uniformly express. This strong convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including sparse and dense variants of the Erdős-Rényi model, the stochastic block model, and the Barabási-Albert model. We empirically validate these findings, observing that the convergence phenomenon appears not only on random graphs but also on some real-world graphs.

Almost Surely Asymptotically Constant Graph Neural Networks

TL;DR

This work presents a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as the authors apply them on larger graphs drawn from some random graph model, and shows that the output converges to a constant function.

Abstract

We present a new angle on the expressive power of graph neural networks (GNNs) by studying how the predictions of real-valued GNN classifiers, such as those classifying graphs probabilistically, evolve as we apply them on larger graphs drawn from some random graph model. We show that the output converges to a constant function, which upper-bounds what these classifiers can uniformly express. This strong convergence phenomenon applies to a very wide class of GNNs, including state of the art models, with aggregates including mean and the attention-based mechanism of graph transformers. Our results apply to a broad class of random graph models, including sparse and dense variants of the Erdős-Rényi model, the stochastic block model, and the Barabási-Albert model. We empirically validate these findings, observing that the convergence phenomenon appears not only on random graphs but also on some real-world graphs.
Paper Structure (29 sections, 23 theorems, 93 equations, 11 figures)

This paper contains 29 sections, 23 theorems, 93 equations, 11 figures.

Key Result

Theorem 5.1

Consider $(\mu_n)_{n \in \mathbb{N}}$ sampling a graph $G$ from any of the following models and node features independently from i.i.d. bounded distributions on $d$ features. Then every $\textsc{Agg}{[}\small \textsc{Wmean}, {\textsc{RW}}]$ term converges a.a.s. to a constant with respect to $(\mu_n)_{n \in \mathbb{N}}$.The appendix includes additional results for the GCN aggregator.

Figures (11)

  • Figure 1: The output of the considered GNNs eventually become constant as the graph sizes increase.
  • Figure 2: Evaluation of the term $\sum_{x \in \mathcal{N}(y)} (2\mathrm H(x) + 2) \star 1.0$ on a small graph with scalar features. The term computes the mean of $z \mapsto 2z + 2$ on each of a node's neighbours. As each sub-term has one free variable, we can represent the intermediate results as scalar values for each node.
  • Figure 3: Each plot shows the five mean class probabilities (in different colours) with standard deviations of a single model initialization over $\mathrm{ER}(n, p(n)=0.1)$, $\mathrm{ER}(n, p(n)=\frac{\log{n}}{n})$, and $\mathrm{ER}(n, p(n)=\frac{1}{50n})$, as we draw graphs of increasing size.
  • Figure 4: Each plot depicts the standard deviation of Euclidean distances between class probabilities and their respective means across various samples of each graph size for GPS+RW.
  • Figure 5: Standard deviation of distances between class probabilities and their means across TIGER-Alaska graph sizes for MeanGNN.
  • ...and 6 more figures

Theorems & Definitions (46)

  • Definition 4.1: Term language
  • Definition 4.2
  • Definition 4.3
  • Theorem 5.1
  • Corollary 5.2
  • Theorem 5.3: Aggregate Elimination for Non-Sparse Graphs
  • Definition 5.4
  • Lemma 5.5: Weak local convergence
  • Theorem 5.6: Aggregate Elimination for Sparser Graphs
  • Theorem 5.7
  • ...and 36 more