Table of Contents
Fetching ...

Homomorphism Counts for Graph Neural Networks: All About That Basis

Emily Jin, Michael Bronstein, İsmail İlkan Ceylan, Matthias Lanzinger

TL;DR

The paper addresses the expressivity gap in message-passing GNNs, which are bounded by the $1$-WL test, by introducing the homomorphism basis $\,\mathbb{B}_Γ$ for graph motif parameters. It proves that injecting the homomorphism counts for all graphs in the basis yields strictly more expressive models than counting a single motif or fixed subgraphs, without increasing asymptotic computational cost. The authors develop a practical two-step workflow: compute the basis and coefficients once, then count $\,\mathsf{Hom}(F, G)$ for each basis graph $F$, with node-level and graph-level variants, and validate the approach on ZINC, QM9, COLLAB, and BREC, achieving notable performance gains. This work provides a principled, scalable route to targeted expressivity in GNNs and broadens the scope of graph motif parameters beyond simple pattern counting by linking them to homomorphism counts.

Abstract

A large body of work has investigated the properties of graph neural networks and identified several limitations, particularly pertaining to their expressive power. Their inability to count certain patterns (e.g., cycles) in a graph lies at the heart of such limitations, since many functions to be learned rely on the ability of counting such patterns. Two prominent paradigms aim to address this limitation by enriching the graph features with subgraph or homomorphism pattern counts. In this work, we show that both of these approaches are sub-optimal in a certain sense and argue for a more fine-grained approach, which incorporates the homomorphism counts of all structures in the ``basis'' of the target pattern. This yields strictly more expressive architectures without incurring any additional overhead in terms of computational complexity compared to existing approaches. We prove a series of theoretical results on node-level and graph-level motif parameters and empirically validate them on standard benchmark datasets.

Homomorphism Counts for Graph Neural Networks: All About That Basis

TL;DR

The paper addresses the expressivity gap in message-passing GNNs, which are bounded by the -WL test, by introducing the homomorphism basis for graph motif parameters. It proves that injecting the homomorphism counts for all graphs in the basis yields strictly more expressive models than counting a single motif or fixed subgraphs, without increasing asymptotic computational cost. The authors develop a practical two-step workflow: compute the basis and coefficients once, then count for each basis graph , with node-level and graph-level variants, and validate the approach on ZINC, QM9, COLLAB, and BREC, achieving notable performance gains. This work provides a principled, scalable route to targeted expressivity in GNNs and broadens the scope of graph motif parameters beyond simple pattern counting by linking them to homomorphism counts.

Abstract

A large body of work has investigated the properties of graph neural networks and identified several limitations, particularly pertaining to their expressive power. Their inability to count certain patterns (e.g., cycles) in a graph lies at the heart of such limitations, since many functions to be learned rely on the ability of counting such patterns. Two prominent paradigms aim to address this limitation by enriching the graph features with subgraph or homomorphism pattern counts. In this work, we show that both of these approaches are sub-optimal in a certain sense and argue for a more fine-grained approach, which incorporates the homomorphism counts of all structures in the ``basis'' of the target pattern. This yields strictly more expressive architectures without incurring any additional overhead in terms of computational complexity compared to existing approaches. We prove a series of theoretical results on node-level and graph-level motif parameters and empirically validate them on standard benchmark datasets.
Paper Structure (37 sections, 9 theorems, 26 equations, 7 figures, 12 tables)

This paper contains 37 sections, 9 theorems, 26 equations, 7 figures, 12 tables.

Key Result

Theorem 4.1

Let $\Gamma$ be a connected graph motif parameter and $k \geq 1$. Then $k$-WL with $\mathbb{B}_\Gamma$ is at least as expressive as $k$-WL with $\{\Gamma\}$. Moreover, if at least two functions in $\mathsf{Hom}\xspace(\mathsf{Supp}(\Gamma))$ cannot be expressed by $k$-WL with $\{\Gamma\}$, then $\Ps

Figures (7)

  • Figure 1: Two 1-WL indistinguishable graphs $G_1$ and $G_2$. These graphs have the same number of $5$-cycles, but they can be distinguished by the homomorphism counts of $5$-cycles. $\mathsf{Sub}\xspace(F,G)$ is the number of times $F$ occurs as a subgraph in $G$, and $\mathsf{Hom}\xspace(F,G)$ is the number of homomorphisms from $F$ to $G$.
  • Figure 2: Expressiveness gain from injecting parameters. All inclusions are proper. All three features require (asymptotically) equivalent effort to calculate; they all can be computed in quadratic time (and no better).
  • Figure 3: We report the effect of using incrementally more of $\mathsf{Spasm}\xspace(C_8)$ as node features for each model. The performance of GIN with $\mathsf{Sub}\xspace(C_8, \cdot)$ at node-level is also included for reference.
  • Figure 4: We report the effect of using different encoding methods and dimensions for the $\mathsf{Spasm}\xspace(P_6)$ homomorphism count features with GAT on COLLAB. The dashed red line indicates the performance of GAT without additional features. Using a positional encoding (PE) boosts performance of the baseline model, whereas a standard MLP encoder degrades performance.
  • Figure 5:
  • ...and 2 more figures

Theorems & Definitions (25)

  • Example 1.1
  • Example 1.2
  • Example 1.3
  • Example 1.4
  • Example 1.5
  • Theorem 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Example 4.4
  • Theorem 4.5
  • ...and 15 more