Table of Contents
Fetching ...

Graph Neural Networks Use Graphs When They Shouldn't

Maya Bechler-Speicher, Ido Amos, Ran Gilad-Bachrach, Amir Globerson

TL;DR

This work analyzes the implicit bias of gradient-descent learning of GNNs and proves that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data.

Abstract

Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure. Namely, they use it even when a better solution can be obtained by ignoring it. We analyze the implicit bias of gradient-descent learning of GNNs and prove that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this over-fitting. We also prove that within the family of regular graphs, GNNs are guaranteed to extrapolate when learning with gradient descent. Finally, based on our empirical and theoretical findings, we demonstrate on real-data how regular graphs can be leveraged to reduce graph overfitting and enhance performance.

Graph Neural Networks Use Graphs When They Shouldn't

TL;DR

This work analyzes the implicit bias of gradient-descent learning of GNNs and proves that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data.

Abstract

Predictions over graphs play a crucial role in various domains, including social networks and medicine. Graph Neural Networks (GNNs) have emerged as the dominant approach for learning on graph data. Although a graph-structure is provided as input to the GNN, in some cases the best solution can be obtained by ignoring it. While GNNs have the ability to ignore the graph- structure in such cases, it is not clear that they will. In this work, we show that GNNs actually tend to overfit the given graph-structure. Namely, they use it even when a better solution can be obtained by ignoring it. We analyze the implicit bias of gradient-descent learning of GNNs and prove that when the ground truth function does not use the graphs, GNNs are not guaranteed to learn a solution that ignores the graph, even with infinite data. We examine this phenomenon with respect to different graph distributions and find that regular graphs are more robust to this over-fitting. We also prove that within the family of regular graphs, GNNs are guaranteed to extrapolate when learning with gradient descent. Finally, based on our empirical and theoretical findings, we demonstrate on real-data how regular graphs can be leveraged to reduce graph overfitting and enhance performance.
Paper Structure (47 sections, 4 theorems, 35 equations, 11 figures, 4 tables)

This paper contains 47 sections, 4 theorems, 35 equations, 11 figures, 4 tables.

Key Result

Lemma 3.1

Let $S$ be a set of linearly separable $r$-regular graph examples. A GNN trained with GD that fits $S$ perfectly converges to a solution such that $\mathbf{w}_2 = r\mathbf{w}_1$. Specifically, the root weights $\mathbf{w}_1$ and topological weights $\mathbf{w}_2$ are aligned.

Figures (11)

  • Figure 1: (a) The learning curves of the same GNN model trained on graphs that have the same node features and only differ in their graph-structure, which is sampled from different distributions. The label is computed from the node features without the use of any graph-structure. If GNNs were to ignore the non-informative graph-structure they were given, similar performance should have been observed for all graph distributions. Among the different distributions, regular graphs exhibit the best performance. (b) The norm ratio between the topological and the root weights along the same runs. Except for the empty graphs, the ratio is always greater than $1$, which indicates that more norm is given to the topological weights.
  • Figure 2: The ratios histogram for test examples that are correctly classified in the extrapolation evaluation presented in Table \ref{['table:regular_extrapolation']}. The condition in Theorem \ref{['thm:extrapolation']} is met for all the correctly classified examples.
  • Figure 3: Accuracy and error bars of the Proteins datasets as the COV reduces. The performance is monotonically improving.
  • Figure 4: An empirical validation of Theorem 3.1. The ratio between the topological and root weights is equal to the regularity degree of the graphs. V is the number of nodes in each graph, and r is the regularity degree.
  • Figure 5: Evaluation of the GIN gin model on the Sum task where the graph should be ignored, as described in Section 2.2 in the main paper.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Lemma 3.1: Weight alignment
  • Theorem 3.2: Extrapolation may fail
  • Theorem 3.3: Extrapolation within regular distributions
  • Theorem 3.4: Sufficient condition for extrapolation