Table of Contents
Fetching ...

Graph neural networks extrapolate out-of-distribution for shortest paths

Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang

TL;DR

The paper tackles the challenge of graph neural networks extrapolating to out-of-distribution inputs by enforcing neural algorithmic alignment with the Bellman-Ford dynamic-programming paradigm through sparsity-regularized training. It proves that a MinAgg GNN trained on a small set of BF instances implements BF exactly (or with proportional error) and thus generalizes to arbitrarily large graphs. Theoretical results (including bounds on the extrapolation error) are complemented by empirical evidence showing that gradient descent can find BF-aligned solutions when L1 regularization is applied. This work provides a principled route to robust size generalization in GNN-based shortest-path computations and suggests broader applicability to other algorithmic tasks and architectures.

Abstract

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $ε$, it implements the BF algorithm with an error of $O(ε)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.

Graph neural networks extrapolate out-of-distribution for shortest paths

TL;DR

The paper tackles the challenge of graph neural networks extrapolating to out-of-distribution inputs by enforcing neural algorithmic alignment with the Bellman-Ford dynamic-programming paradigm through sparsity-regularized training. It proves that a MinAgg GNN trained on a small set of BF instances implements BF exactly (or with proportional error) and thus generalizes to arbitrarily large graphs. Theoretical results (including bounds on the extrapolation error) are complemented by empirical evidence showing that gradient descent can find BF-aligned solutions when L1 regularization is applied. This work provides a principled route to robust size generalization in GNN-based shortest-path computations and suggests broader applicability to other algorithmic tasks and architectures.

Abstract

Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of , it implements the BF algorithm with an error of . Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.

Paper Structure

This paper contains 31 sections, 14 theorems, 135 equations, 8 figures, 2 tables.

Key Result

Theorem 2.2

Let $0 < \epsilon < 1$. If $\forall G \in \mathscr{H}_{\mathrm{small}}$ and $\forall u \in V(G)$, a 1-layer GNN $\mathcal{A}_{\theta}$ with update given by eq:simple gnn computes a node feature satisfying $|h_u^{(1)}(G) - x_u(\Gamma(G))| < \frac{\epsilon}{20}$, then for any $G' \in \mathscr G$ and $

Figures (8)

  • Figure 1: Overview of GNN feature propagation and the MinAgg GNN layer architecture
  • Figure 2: Graphs used in the training sets $\mathscr H_{\mathrm{small}}$ and $\mathscr G_K.$
  • Figure 3: Performance metrics and parameter updates for a two-layer MinAgg GNN trained on a two steps of the BF algorithm. The dotted line in (a) and (b) is the global minimum of \ref{['eq:l0-loss']}. In (a) and (b), we track the change in the train loss, test loss, and $\mathcal{L}_{\mathrm{reg}}$ over each optimization step for the models trained with $\mathcal{L}_{\mathrm{MSE}, L_1}$ and $\mathcal{L}_{\mathrm{MSE}}$. The final test loss for the model trained with $\mathcal{L}_{\mathrm{MSE}, L_1}$ is 0.006 while the final test loss for the model trained with $\mathcal{L}_{\mathrm{MSE}}$ is 0.288. (b) and (c) show changes in model parameters over each optimization step with and without $L_1$ regularization, respectively. Each curve has been smoothed with a truncated Gaussian filter with $\sigma=20$.
  • Figure 4: A diagram showing an example of a MinAgg GNN with the sparsity structure given by \ref{['lem:dependencies']}. Bold black connections in the neural network indicate non zero parameters, while grey lines indicate zero parameters.
  • Figure 5: Performance and parameter weights for the small MinAgg GNN instance (trained with MSE). Recall from to \ref{['thm:main-small']}, we expect that $w_2b_1 + b_2 = 0$ and $w_2W_{11} = w_2W_{12} = 1.0$. We indicate these values by the black dotted lines in (a) and show that the parameters of our small MinAgg GNN converge to the expected parameter values from \ref{['thm:main-small']}. Additionally, we verify that convergence to this parameter configuration corresponds to low test error in (b) as we have that $\mathcal{E}_{\mathrm{test}}$ converges to 0.0018.
  • ...and 3 more figures

Theorems & Definitions (41)

  • Definition 2.1
  • Theorem 2.2
  • proof : Proof Sketch
  • Theorem 2.3
  • proof : Proof Sketch
  • Definition A.1
  • Definition A.2
  • Theorem B.1
  • proof
  • Corollary B.2
  • ...and 31 more