Table of Contents
Fetching ...

What Do GNNs Actually Learn? Towards Understanding their Representations

Giannis Nikolentzos, Michail Chatzianastasis, Michalis Vazirgiannis

TL;DR

It is found that if the initial representations of all nodes point in the same direction, the representations learned at the $k$-th layer of the models are also related to the initial features of nodes that can be reached in exactly $k$ steps.

Abstract

In recent years, graph neural networks (GNNs) have achieved great success in the field of graph representation learning. Although prior work has shed light on the expressiveness of those models (\ie whether they can distinguish pairs of non-isomorphic graphs), it is still not clear what structural information is encoded into the node representations that are learned by those models. In this paper, we address this gap by studying the node representations learned by four standard GNN models. We find that some models produce identical representations for all nodes, while the representations learned by other models are linked to some notion of walks of specific length that start from the nodes. We establish Lipschitz bounds for these models with respect to the number of (normalized) walks. Additionally, we investigate the influence of node features on the learned representations. We find that if the initial representations of all nodes point in the same direction, the representations learned at the $k$-th layer of the models are also related to the initial features of nodes that can be reached in exactly $k$ steps. We also apply our findings to understand the phenomenon of oversquashing that occurs in GNNs. Our theoretical analysis is validated through experiments on synthetic and real-world datasets.

What Do GNNs Actually Learn? Towards Understanding their Representations

TL;DR

It is found that if the initial representations of all nodes point in the same direction, the representations learned at the -th layer of the models are also related to the initial features of nodes that can be reached in exactly steps.

Abstract

In recent years, graph neural networks (GNNs) have achieved great success in the field of graph representation learning. Although prior work has shed light on the expressiveness of those models (\ie whether they can distinguish pairs of non-isomorphic graphs), it is still not clear what structural information is encoded into the node representations that are learned by those models. In this paper, we address this gap by studying the node representations learned by four standard GNN models. We find that some models produce identical representations for all nodes, while the representations learned by other models are linked to some notion of walks of specific length that start from the nodes. We establish Lipschitz bounds for these models with respect to the number of (normalized) walks. Additionally, we investigate the influence of node features on the learned representations. We find that if the initial representations of all nodes point in the same direction, the representations learned at the -th layer of the models are also related to the initial features of nodes that can be reached in exactly steps. We also apply our findings to understand the phenomenon of oversquashing that occurs in GNNs. Our theoretical analysis is validated through experiments on synthetic and real-world datasets.
Paper Structure (18 sections, 6 theorems, 21 equations, 12 figures, 1 table)

This paper contains 18 sections, 6 theorems, 21 equations, 12 figures, 1 table.

Key Result

Theorem 1

Let $\mathcal{G}=\{G_1, \ldots,G_N\}$ be a collection of graphs. Let also $\mathcal{V}=V_1 \cup \ldots \cup V_N$ denote the set that contains the nodes of all graphs. All nodes are initially annotated with the same representation. Without loss of generality, we assume that they are annotated with a

Figures (12)

  • Figure 1: Euclidean distances of the representations generated at the third layer of the different models vs. Euclidean distances of the number of walks (or sum of normalized walks) of length $3$ starting from the different nodes. Nodes are initially annotated with a single feature equal to $1$.
  • Figure 2: The number of walks of length $2$ starting from the red nodes of the three graphs is equal to $10$. A GIN model that consists of $2$ layers embeds these three nodes close to each other (or to the same representation in case there are no biases) even though they are structurally dissimilar.
  • Figure 3: The sum of normalized walks of length $2$ starting from the red nodes of the two graphs is approximately equal to $0.890$. A GCN model that consists of $2$ layers embeds these two nodes to the same representation even though they are structurally dissimilar.
  • Figure 4: Euclidean distances of the representations generated at the third layer of the different models vs. Euclidean distances of the number of walks (or sum of normalized walks) of length $3$ starting from the different nodes. Nodes $v$ and $u$ correspond to the same node in the original and the perturbed graph, respectively. Each perturbed graph has emerged by removing one node from node $v$'s neighborhood.
  • Figure 5: Example of a graph where the removal of a node disconnects the graph. This leads to a large decrease in the number of walks of length $6$ that start from node $v$.
  • ...and 7 more figures

Theorems & Definitions (10)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof