Covered Forest: Fine-grained generalization analysis of graph neural networks
Antonis Vasileiou, Ben Finkelshtein, Floris Geerts, Ron Levie, Christopher Morris
TL;DR
This work tackles the problem of generalization for message-passing graph neural networks by introducing refined graph similarities that align with MPNN computations. By leveraging data-dependent coverings, Tree distance, Forest distance, and the Tree Mover's distance (TMD), the authors develop a robustness framework and prove generalization bounds that hold under a range of losses, not just the 0-1 loss, and for various aggregation schemes including sum and mean. A key contribution is showing that graph structure and the choice of metric (e.g., Tree distance vs WL-based metrics) significantly affect covering numbers and hence generalization bounds, with empirical results validating the theory on synthetic graph families and real-world datasets. The paper also introduces the mean-1-WL (1-MWL) perspective and connects it to mean-aggregation MPNNs, offering fine-grained analyses that unify expressivity, robustness, and generalization, and yielding practical guidance on architecture and metric selection for improved generalization in graph learning.
Abstract
The expressive power of message-passing graph neural networks (MPNNs) is reasonably well understood, primarily through combinatorial techniques from graph isomorphism testing. However, MPNNs' generalization abilities -- making meaningful predictions beyond the training set -- remain less explored. Current generalization analyses often overlook graph structure, limit the focus to specific aggregation functions, and assume the impractical, hard-to-optimize $0$-$1$ loss function. Here, we extend recent advances in graph similarity theory to assess the influence of graph structure, aggregation, and loss functions on MPNNs' generalization abilities. Our empirical study supports our theoretical insights, improving our understanding of MPNNs' generalization properties.
