An Efficient Subgraph GNN with Provable Substructure Counting Power
Zuoyu Yan, Junru Zhou, Liangcai Gao, Zhi Tang, Muhan Zhang
TL;DR
This work targets efficient, provable substructure counting with graph neural networks. It introduces ESC-GNN, which augments backbones with precomputed distance-based structural embeddings over rooted $k$-tuples, enabling subgraph counting without repeatedly applying GNNs to all subgraphs. Theoretical results (Theorems 3.2–3.4, 4.2–4.5) characterize counting and distinguishing power relative to the Weisfeiler-Leman hierarchy and show ESC-GNN can count a range of connected and induced substructures while remaining more scalable than full subgraph GNNs. Empirically, ESC-GNN achieves strong real-world performance on molecular and TU benchmarks, with favorable space/time trade-offs and clear ablations illustrating the benefits of global message passing and all components of the structural encoding. Overall, the approach provides a practical, provably expressive model for substructure-aware graph learning with reduced computational cost.
Abstract
We investigate the enhancement of graph neural networks' (GNNs) representation power through their ability in substructure counting. Recent advances have seen the adoption of subgraph GNNs, which partition an input graph into numerous subgraphs, subsequently applying GNNs to each to augment the graph's overall representation. Despite their ability to identify various substructures, subgraph GNNs are hindered by significant computational and memory costs. In this paper, we tackle a critical question: Is it possible for GNNs to count substructures both \textbf{efficiently} and \textbf{provably}? Our approach begins with a theoretical demonstration that the distance to rooted nodes in subgraphs is key to boosting the counting power of subgraph GNNs. To avoid the need for repetitively applying GNN across all subgraphs, we introduce precomputed structural embeddings that encapsulate this crucial distance information. Experiments validate that our proposed model retains the counting power of subgraph GNNs while achieving significantly faster performance.
