Hyperbolic Benchmarking Unveils Network Topology-Feature Relationship in GNN Performance
Roya Aliakbarisani, Robert Jankowski, M. Ángeles Serrano, Marián Boguñá
TL;DR
The paper tackles the challenge of fairly benchmarking GNNs across graphs with diverse topology and feature couplings. It proposes HypNF, a benchmarking framework built on the $S^1/H^2$ hyperbolic soft configuration model and its bipartite extension to generate synthetic graphs with tunable degree distributions, clustering, homophily, and topology-feature alignment. Empirically, stronger topology-feature coupling and hyperbolic embeddings yield advantages, especially for link prediction, while simple feature-based baselines can compete in node classification under certain conditions. By providing an open-source, controllable data generator, the work enables standardized model comparisons and practical guidance for model selection in real-world datasets.
Abstract
Graph Neural Networks (GNNs) have excelled in predicting graph properties in various applications ranging from identifying trends in social networks to drug discovery and malware detection. With the abundance of new architectures and increased complexity, GNNs are becoming highly specialized when tested on a few well-known datasets. However, how the performance of GNNs depends on the topological and features properties of graphs is still an open question. In this work, we introduce a comprehensive benchmarking framework for graph machine learning, focusing on the performance of GNNs across varied network structures. Utilizing the geometric soft configuration model in hyperbolic space, we generate synthetic networks with realistic topological properties and node feature vectors. This approach enables us to assess the impact of network properties, such as topology-feature correlation, degree distributions, local density of triangles (or clustering), and homophily, on the effectiveness of different GNN architectures. Our results highlight the dependency of model performance on the interplay between network structure and node features, providing insights for model selection in various scenarios. This study contributes to the field by offering a versatile tool for evaluating GNNs, thereby assisting in developing and selecting suitable models based on specific data characteristics.
