An Empirical Study of Realized GNN Expressiveness
Yanbo Wang, Muhan Zhang
TL;DR
This paper introduces BREC, a large, diverse dataset designed to test GNN expressiveness beyond the 1-WL bound up to 4-WL-indistinguishable graphs, addressing prior datasets' limitations in difficulty, granularity, and scale. It pairs BREC with RPC, a robust evaluation framework using a Siamese GNN and Hotelling's T-squared tests to quantify real-world discriminative power while accounting for numerical fluctuations. Through extensive experiments on 23 models, the study shows that realized expressiveness largely tracks theoretical expectations but also reveals notable gaps, with distances encoding and optimal subgraph radii being crucial for performance. The work provides practical tools and insights to guide the development of more expressive GNN architectures, and releases the dataset and code publicly to facilitate reproducible benchmarking.
Abstract
Research on the theoretical expressiveness of Graph Neural Networks (GNNs) has developed rapidly, and many methods have been proposed to enhance the expressiveness. However, most methods do not have a uniform expressiveness measure except for a few that strictly follow the $k$-dimensional Weisfeiler-Lehman ($k$-WL) test hierarchy, leading to difficulties in quantitatively comparing their expressiveness. Previous research has attempted to use datasets for measurement, but facing problems with difficulty (any model surpassing 1-WL has nearly 100% accuracy), granularity (models tend to be either 100% correct or near random guess), and scale (only several essentially different graphs involved). To address these limitations, we study the realized expressive power that a practical model instance can achieve using a novel expressiveness dataset, BREC, which poses greater difficulty (with up to 4-WL-indistinguishable graphs), finer granularity (enabling comparison of models between 1-WL and 3-WL), a larger scale (consisting of 800 1-WL-indistinguishable graphs that are non-isomorphic to each other). We synthetically test 23 models with higher-than-1-WL expressiveness on BREC. Our experiment gives the first thorough measurement of the realized expressiveness of those state-of-the-art beyond-1-WL GNN models and reveals the gap between theoretical and realized expressiveness. Dataset and evaluation codes are released at: https://github.com/GraphPKU/BREC.
