Table of Contents
Fetching ...

Zero-One Laws of Graph Neural Networks

Sam Adam-Day, Theodor Mihai Iliant, İsmail İlkan Ceylan

TL;DR

It is shown that when the authors draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or one.

Abstract

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erdős-Rényi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.

Zero-One Laws of Graph Neural Networks

TL;DR

It is shown that when the authors draw graphs of increasing size from the Erd\H{o}s-R\'enyi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or one.

Abstract

Graph neural networks (GNNs) are the de facto standard deep learning architectures for machine learning on graphs. This has led to a large body of work analyzing the capabilities and limitations of these models, particularly pertaining to their representation and extrapolation capacity. We offer a novel theoretical perspective on the representation and extrapolation capacity of GNNs, by answering the question: how do GNNs behave as the number of graph nodes become very large? Under mild assumptions, we show that when we draw graphs of increasing size from the Erdős-Rényi model, the probability that such graphs are mapped to a particular output by a class of GNN classifiers tends to either zero or to one. This class includes the popular graph convolutional network architecture. The result establishes 'zero-one laws' for these GNNs, and analogously to other convergence laws, entails theoretical limitations on their capacity. We empirically verify our results, observing that the theoretical asymptotic limits are evident already on relatively small graphs.
Paper Structure (23 sections, 12 theorems, 56 equations, 8 figures)

This paper contains 23 sections, 12 theorems, 56 equations, 8 figures.

Key Result

Theorem 4.6

Let $\mathcal{M}$ be a $\textsc{GCN}$ used for binary graph classification and take $r \in [0,1]$. Then, $\mathcal{M}$ satisfies a zero-one law with respect to graph distribution ${\mathbb{G}}(n,r)$ and feature distribution ${\mathbb{D}}(d)$ assuming the following conditions hold: (i) the distributi

Figures (8)

  • Figure 1: Each plot shows the proportion of graphs of certain size which are classified as $1$ by a set of ten GCNs (top row), $\textsc{MeanGNN}\text{s}$ (middle), and $\textsc{SumGNN}\text{s}$ (bottom row). Each curve (color-coded) shows the behavior of a model, as we draw increasingly larger graphs. The phenomenon is observed for 1-layer models (left column), 2-layer models (mid column), and 3-layer models (last column). GCNs and $\textsc{MeanGNN}\text{s}$ behave very similarly with all models converging quickly to $0$ or to $1$. $\textsc{SumGNN}\text{s}$ shows slightly slower convergence, but all models perfectly converge in all layers.
  • Figure 2: Normally distributed random node features with $\textsc{GCN}$ models. Each plot shows the proportion of graphs of certain size which are classified as $1$ by a set of ten GCN models. Each curve (color-coded) shows the behavior of a model, as we draw increasingly larger graphs. The phenomenon is observed for 1-layer models (left column), 2-layer models (mid column), and 3-layer models (last column). We draw the initial features randomly from a normal distribution with mean 0.5 and standard deviation $1$.
  • Figure 3: $\textsc{GCN}$ models with $\mathrm{ReLU}$ non-linearity. Each plot shows the proportion of graphs of certain size which are classified as $1$ by a set of ten GCN models. Each curve (color-coded) shows the behavior of a model, as we draw increasingly larger graphs. The phenomenon is observed for 1-layer models (left column), 2-layer models (mid column), and 3-layer models (last column). This time we choose the $\mathrm{ReLU}$ activation function for the GNN layers. Apart from this, the setup is the same as in the main body of the paper.
  • Figure 4: $\textsc{GCN}$ models with $\tanh$ non-linearity. Each plot shows the proportion of graphs of certain size which are classified as $1$ by a set of ten GCN models. Each curve (color-coded) shows the behavior of a model, as we draw increasingly larger graphs. The phenomenon is observed for 1-layer models (left column), 2-layer models (mid column), and 3-layer models (last column). We use $\tanh$ as an activation function for the GNN layers, and keep everything else the same.
  • Figure 5: $\textsc{GCN}$ models with $\mathrm{sigmoid}$ non-linearity. Each plot shows the proportion of graphs of certain size which are classified as $1$ by a set of ten GCN models. Each curve (color-coded) shows the behavior of a model, as we draw increasingly larger graphs. The phenomenon is observed for 1-layer models (left column), 2-layer models (mid column), and 3-layer models (last column). We use the $\mathrm{sigmoid}$ activation function for the GNN layers, and keep everything else the same.
  • ...and 3 more figures

Theorems & Definitions (35)

  • Definition 4.1
  • Definition 4.2
  • Definition 4.3
  • Definition 4.4
  • Definition 4.5
  • Theorem 4.6
  • Lemma 4.7
  • Definition 4.8
  • Definition 4.9
  • Theorem 4.10
  • ...and 25 more