Future Directions in the Theory of Graph Machine Learning

Christopher Morris; Fabrizio Frasca; Nadav Dym; Haggai Maron; İsmail İlkan Ceylan; Ron Levie; Derek Lim; Michael Bronstein; Martin Grohe; Stefanie Jegelka

Future Directions in the Theory of Graph Machine Learning

Christopher Morris, Fabrizio Frasca, Nadav Dym, Haggai Maron, İsmail İlkan Ceylan, Ron Levie, Derek Lim, Michael Bronstein, Martin Grohe, Stefanie Jegelka

TL;DR

The paper argues for a balanced theory of graph machine learning that goes beyond coarse combinatorial expressivity (e.g., $1$-WL) to incorporate geometry, generalization, and optimization, all aligned with practical applications. It outlines a comprehensive program of challenges across four pillars: expressive power (II.1–II.6), generalization (III.1–III.4), optimization dynamics (IV.1–IV.5), and practice alignment (V.1–V.5), advocating geometry-based expressivity, uniform bounds, graph-class awareness, and domain-adaptive architectures. A core idea is to develop a metric-based relationship between graph space and GNN feature space (e.g., a bi-Lipschitz correspondence between $d_G$ and $d_F$) to enable finer analyses and universal approximation statements, while also studying data augmentation, extrapolation, and optimization under realistic conditions. The authors propose concrete action items, including a Theo-practical Dojo, a library of theoretically guided implementations, domain-adapted architectures, and principled integration of LLMs, to translate theory into practice and accelerate impact in domains such as molecular design and combinatorial optimization.

Abstract

Machine learning on graphs, especially using graph neural networks (GNNs), has seen a surge in interest due to the wide availability of graph data across a broad spectrum of disciplines, from life to social and engineering sciences. Despite their practical success, our theoretical understanding of the properties of GNNs remains highly incomplete. Recent theoretical advancements primarily focus on elucidating the coarse-grained expressive power of GNNs, predominantly employing combinatorial techniques. However, these studies do not perfectly align with practice, particularly in understanding the generalization behavior of GNNs when trained with stochastic first-order optimization techniques. In this position paper, we argue that the graph machine learning community needs to shift its attention to developing a balanced theory of graph machine learning, focusing on a more thorough understanding of the interplay of expressive power, generalization, and optimization.

Future Directions in the Theory of Graph Machine Learning

TL;DR

The paper argues for a balanced theory of graph machine learning that goes beyond coarse combinatorial expressivity (e.g.,

-WL) to incorporate geometry, generalization, and optimization, all aligned with practical applications. It outlines a comprehensive program of challenges across four pillars: expressive power (II.1–II.6), generalization (III.1–III.4), optimization dynamics (IV.1–IV.5), and practice alignment (V.1–V.5), advocating geometry-based expressivity, uniform bounds, graph-class awareness, and domain-adaptive architectures. A core idea is to develop a metric-based relationship between graph space and GNN feature space (e.g., a bi-Lipschitz correspondence between

and

) to enable finer analyses and universal approximation statements, while also studying data augmentation, extrapolation, and optimization under realistic conditions. The authors propose concrete action items, including a Theo-practical Dojo, a library of theoretically guided implementations, domain-adapted architectures, and principled integration of LLMs, to translate theory into practice and accelerate impact in domains such as molecular design and combinatorial optimization.

Abstract

Paper Structure (30 sections, 2 figures)

This paper contains 30 sections, 2 figures.

Introduction
Expressive Power of GNNs
Challenges
Challenge II.1: From combinatorial to geometric expressiveness results.
Challenge II.2: Towards understanding expressiveness for all practical architectures.
Challenge II.3: Towards uniform expressiveness results.
Challenge II.4: Towards expressiveness on relevant classes of graphs.
Challenge II.5: Towards a formal trade-off between expressive power and computational cost.
Challenge II.6: Towards linking architecture, task, and graph structure.
Generalization Properties of GNNs
Challenges
Challenge III.1: Understanding the influence of expressiveness and architectural choices on generalization.
Challenge III.2: Understanding the impact of graph structure on generalization and its interplay geometry.
Challenge III.3: Develop a theory of data augmentation.
Challenge III.4: Understanding and improving extrapolation, especially to larger graphs.
...and 15 more sections

Figures (2)

Figure 1: Interactions of the four challenges within graph machine learning: Fine-grained expressivity, generalization, optimization, applications, and their interactions. The green boxes architectural choices (hyperparameter and other design choices like normalization layers), model parameters, and graph classes (different types of graphs) represent aspects of all four challenges.
Figure 2: Proposal for a better alignment of theoretical and practical research within the graph machine learning community. We propose the tight interaction and iterative refinement of mathematical models and architectural choices via rigorous experimental evaluations supported by state-of-the-art baseline implementations, benchmarks, evaluation pipelines, and visual exploration tools.

Future Directions in the Theory of Graph Machine Learning

TL;DR

Abstract

Future Directions in the Theory of Graph Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (2)