Graph neural networks extrapolate out-of-distribution for shortest paths
Robert R. Nerem, Samantha Chen, Sanjoy Dasgupta, Yusu Wang
TL;DR
The paper tackles the challenge of graph neural networks extrapolating to out-of-distribution inputs by enforcing neural algorithmic alignment with the Bellman-Ford dynamic-programming paradigm through sparsity-regularized training. It proves that a MinAgg GNN trained on a small set of BF instances implements BF exactly (or with proportional error) and thus generalizes to arbitrarily large graphs. Theoretical results (including bounds on the extrapolation error) are complemented by empirical evidence showing that gradient descent can find BF-aligned solutions when L1 regularization is applied. This work provides a principled route to robust size generalization in GNN-based shortest-path computations and suggests broader applicability to other algorithmic tasks and architectures.
Abstract
Neural networks (NNs), despite their success and wide adoption, still struggle to extrapolate out-of-distribution (OOD), i.e., to inputs that are not well-represented by their training dataset. Addressing the OOD generalization gap is crucial when models are deployed in environments significantly different from the training set, such as applying Graph Neural Networks (GNNs) trained on small graphs to large, real-world graphs. One promising approach for achieving robust OOD generalization is the framework of neural algorithmic alignment, which incorporates ideas from classical algorithms by designing neural architectures that resemble specific algorithmic paradigms (e.g. dynamic programming). The hope is that trained models of this form would have superior OOD capabilities, in much the same way that classical algorithms work for all instances. We rigorously analyze the role of algorithmic alignment in achieving OOD generalization, focusing on graph neural networks (GNNs) applied to the canonical shortest path problem. We prove that GNNs, trained to minimize a sparsity-regularized loss over a small set of shortest path instances, exactly implement the Bellman-Ford (BF) algorithm for shortest paths. In fact, if a GNN minimizes this loss within an error of $ε$, it implements the BF algorithm with an error of $O(ε)$. Consequently, despite limited training data, these GNNs are guaranteed to extrapolate to arbitrary shortest-path problems, including instances of any size. Our empirical results support our theory by showing that NNs trained by gradient descent are able to minimize this loss and extrapolate in practice.
