Leveraging advances in machine learning for the robust classification and interpretation of networks
Raima Carol Appaw, Nicholas Fountain-Jones, Michael A. Charleston
TL;DR
This work tackles the problem of identifying the generative model that best explains observed networks by leveraging interpretable machine learning. It combines large-scale synthetic data from ER, SW, Spatial, SF, and SBM models with empirical networks, and uses SHAP and Friedman-Hastie statistics to uncover main effects and feature interactions among a rich set of graph metrics, including spectral properties. The study demonstrates near-perfect classification accuracy and clarifies how spectral measures and centrality-related features drive model discrimination, providing thresholds where interactions become decisive. The authors also deliver a practical toolkit, including an open-source pipeline and an interactive Shiny app, enabling researchers to classify new networks and interpret the driving feature interactions in real-world contexts.
Abstract
The ability to simulate realistic networks based on empirical data is an important task across scientific disciplines, from epidemiology to computer science. Often simulation approaches involve selecting a suitable network generative model such as Erdös-Rényi or small-world. However, few tools are available to quantify if a particular generative model is suitable for capturing a given network structure or organization. We utilize advances in interpretable machine learning to classify simulated networks by our generative models based on various network attributes, using both primary features and their interactions. Our study underscores the significance of specific network features and their interactions in distinguishing generative models, comprehending complex network structures, and the formation of real-world networks.
