Table of Contents
Fetching ...

Weisfeiler-Leman at the margin: When more expressivity matters

Billy J. Franks, Christopher Morris, Ameya Velingker, Floris Geerts

TL;DR

The paper interrogates whether increasing expressivity in WL-based graph kernels and MPNNs translates into improved generalization, revealing that expressivity alone offers limited predictive guidance when framed purely in terms of graph isomorphism. By leveraging margin theory and VC-dimension analyses, it shows how subgraph-informed WL variants (e.g., 1-WL_F, 1-WLOA_F) can produce larger margins and lower VC-dimension under certain data distributions, thereby enhancing generalization, while also enabling cases where extra power harms margin. It further proves that gradient flow in linear MPNNs aligns weights toward maximum-margin solutions, and demonstrates the existence of WL-based kernel and MPNN variants with provable generalization properties. Empirical studies on standard graph benchmarks corroborate the theoretical claims, illustrating when increased expressivity yields tangible predictive benefits and when the margin, not expressivity per se, governs performance.

Abstract

The Weisfeiler-Leman algorithm ($1$-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, $1$-WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of more expressive MPNN and kernel architectures. However, the relationship between enhanced expressivity and improved generalization performance remains unclear. Here, we show that an architecture's expressivity offers limited insights into its generalization performance when viewed through graph isomorphism. Moreover, we focus on augmenting $1$-WL and MPNNs with subgraph information and employ classical margin theory to investigate the conditions under which an architecture's increased expressivity aligns with improved generalization performance. In addition, we show that gradient flow pushes the MPNN's weights toward the maximum margin solution. Further, we introduce variations of expressive $1$-WL-based kernel and MPNN architectures with provable generalization properties. Our empirical study confirms the validity of our theoretical findings.

Weisfeiler-Leman at the margin: When more expressivity matters

TL;DR

The paper interrogates whether increasing expressivity in WL-based graph kernels and MPNNs translates into improved generalization, revealing that expressivity alone offers limited predictive guidance when framed purely in terms of graph isomorphism. By leveraging margin theory and VC-dimension analyses, it shows how subgraph-informed WL variants (e.g., 1-WL_F, 1-WLOA_F) can produce larger margins and lower VC-dimension under certain data distributions, thereby enhancing generalization, while also enabling cases where extra power harms margin. It further proves that gradient flow in linear MPNNs aligns weights toward maximum-margin solutions, and demonstrates the existence of WL-based kernel and MPNN variants with provable generalization properties. Empirical studies on standard graph benchmarks corroborate the theoretical claims, illustrating when increased expressivity yields tangible predictive benefits and when the margin, not expressivity per se, governs performance.

Abstract

The Weisfeiler-Leman algorithm (-WL) is a well-studied heuristic for the graph isomorphism problem. Recently, the algorithm has played a prominent role in understanding the expressive power of message-passing graph neural networks (MPNNs) and being effective as a graph kernel. Despite its success, -WL faces challenges in distinguishing non-isomorphic graphs, leading to the development of more expressive MPNN and kernel architectures. However, the relationship between enhanced expressivity and improved generalization performance remains unclear. Here, we show that an architecture's expressivity offers limited insights into its generalization performance when viewed through graph isomorphism. Moreover, we focus on augmenting -WL and MPNNs with subgraph information and employ classical margin theory to investigate the conditions under which an architecture's increased expressivity aligns with improved generalization performance. In addition, we show that gradient flow pushes the MPNN's weights toward the maximum margin solution. Further, we introduce variations of expressive -WL-based kernel and MPNN architectures with provable generalization properties. Our empirical study confirms the validity of our theoretical findings.
Paper Structure (64 sections, 62 theorems, 145 equations, 1 figure, 7 tables)

This paper contains 64 sections, 62 theorems, 145 equations, 1 figure, 7 tables.

Key Result

Proposition 1

Let $G$ be a graph and $\mathcal{F}$ be a set of graphs. Then, for all rounds, the $1$-WL$_{\mathcal{F}}$ distinguishes at least the same vertices as the $1$-WL. ∎

Figures (1)

  • Figure 1: Plot illustrating the relation between the margin and generalization error for the ER graphs for different choices of $\mathcal{F}$.

Theorems & Definitions (97)

  • Proposition 1
  • Proposition 2
  • Lemma 3
  • Theorem 4
  • Corollary 5
  • Proposition 6
  • Corollary 7
  • Proposition 8
  • Corollary 9
  • Proposition 10
  • ...and 87 more