Table of Contents
Fetching ...

Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results

Behrooz Tahmasebi, Derek Lim, Stefanie Jegelka

TL;DR

This work addresses the gap between traditional MPNNs and costly higher-order GNNs for substructure counting by introducing Recursive Neighborhood Pooling (RNP-GNN). RNP-GNN recursively pools encodings from derived subgraphs, guided by covering sequences, to count subgraphs of size $k$ while exploiting graph sparsity to achieve reduced computational cost. The authors prove that RNP-GNNs can count any specified set of substructures, establish a universal-approximation result for local graph functions, and derive information-theoretic and ETH-based time-complexity lower bounds that contextualize the approach. Experiments on counting induced triangles, non-induced $3$-stars, and a satisfiability task demonstrate competitive or superior performance against baselines, highlighting the practical impact of sparsity-aware, recursion-based expressivity improvements.

Abstract

While message passing Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed that substantially increase the expressive power, albeit at a large computational cost. Motivated by this gap, we explore alternative strategies and lower bounds. In particular, we analyze a new recursive pooling technique of local neighborhoods that allows different tradeoffs of computational cost and expressive power. First, we prove that this model can count subgraphs of size $k$, and thereby overcomes a known limitation of low-order GNNs. Second, we show how recursive pooling can exploit sparsity to reduce the computational complexity compared to the existing higher-order GNNs. More generally, we provide a (near) matching information-theoretic lower bound for counting subgraphs with graph representations that pool over representations of derived (sub-)graphs. We also discuss lower bounds on time complexity.

Counting Substructures with Higher-Order Graph Neural Networks: Possibility and Impossibility Results

TL;DR

This work addresses the gap between traditional MPNNs and costly higher-order GNNs for substructure counting by introducing Recursive Neighborhood Pooling (RNP-GNN). RNP-GNN recursively pools encodings from derived subgraphs, guided by covering sequences, to count subgraphs of size while exploiting graph sparsity to achieve reduced computational cost. The authors prove that RNP-GNNs can count any specified set of substructures, establish a universal-approximation result for local graph functions, and derive information-theoretic and ETH-based time-complexity lower bounds that contextualize the approach. Experiments on counting induced triangles, non-induced -stars, and a satisfiability task demonstrate competitive or superior performance against baselines, highlighting the practical impact of sparsity-aware, recursion-based expressivity improvements.

Abstract

While message passing Graph Neural Networks (GNNs) have become increasingly popular architectures for learning with graphs, recent works have revealed important shortcomings in their expressive power. In response, several higher-order GNNs have been proposed that substantially increase the expressive power, albeit at a large computational cost. Motivated by this gap, we explore alternative strategies and lower bounds. In particular, we analyze a new recursive pooling technique of local neighborhoods that allows different tradeoffs of computational cost and expressive power. First, we prove that this model can count subgraphs of size , and thereby overcomes a known limitation of low-order GNNs. Second, we show how recursive pooling can exploit sparsity to reduce the computational complexity compared to the existing higher-order GNNs. More generally, we provide a (near) matching information-theoretic lower bound for counting subgraphs with graph representations that pool over representations of derived (sub-)graphs. We also discuss lower bounds on time complexity.

Paper Structure

This paper contains 27 sections, 14 theorems, 50 equations, 3 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

Consider a set of (possibly attributed) graphs $\mathcal{H}$ on $\tau+1$ vertices, such that any $H \in \mathcal{H}$ admits the covering sequence $(r_1,r_2,\ldots,r_\tau)$. Then, there is an RNP-GNN $f( \cdot ;\theta)$ with recursion parameters $(r_1,r_2,\ldots,r_\tau)$ that can count any $H \in \ma

Figures (3)

  • Figure 1: MPNNs cannot count substructures with three nodes or more chen2020can. For example, the graph with black center vertex on the left cannot be counted, since the two graphs on the left result in the same node representations as the graph on the right.
  • Figure 2: Example of a covering sequence computed for the graph on the left. For this graph, $(v_6,v_1, v_4, v_5,v_3,v_2)$ is a vertex covering sequence with respect to the covering sequence $(3, 3, 3, 2,1)$. The first two computations to obtain this covering sequence are depicted in the middle and on the right.
  • Figure 3: For the above graph, $(v_1,v_2,\ldots,v_6)$ is a vertex covering sequence. The corresponding covering sequence $(1,4,3,2,1)$ is not decreasing.

Theorems & Definitions (36)

  • Definition 1
  • Definition 2
  • Definition 3
  • Theorem 1
  • Definition 4
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Corollary 1
  • Corollary 2: gishboliner2020countingbera2019linearbera2020nearlinear
  • ...and 26 more