Learning the mechanisms of network growth
Lourens Touwen, Doina Bucur, Remco van der Hofstad, Alessandro Garavaglia, Nelly Litvak
TL;DR
This work tackles the problem of identifying the underlying growth mechanisms of dynamic networks by reframing model selection as a classification task. It introduces a collapsed continuous-time branching process (CTBP) framework to generate nine growth models from fitness, aging, and preferential attachment mechanisms, and constructs a novel Dynamic Feature Matrix $D$ that captures temporal edge accrual across vertex cohorts. A large synthetic dataset is used to train classifiers on static, dynamic, and combined features, with dynamic features achieving near-perfect accuracy on synthetic networks (≈98%), substantially surpassing static features (≈93%). Applying the method to Web of Science citation networks yields insights consistent with the literature—favoring models that incorporate fitness, aging, and/or preferential attachment—while underscoring the sensitivity of conclusions to feature design. The paper highlights both the potential of feature-informed ML for model selection in dynamics and the need for careful validation and extension to other models and domains.
Abstract
We propose a novel model-selection method for dynamic networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generated by simulating nine state-of-the-art random graph models for dynamic networks, with parameter range chosen to ensure exponential growth of the network size in time. We design a conceptually novel type of dynamic features that count new links received by a group of vertices in a particular time interval. The proposed features are easy to compute, analytically tractable, and interpretable. Our approach achieves a near-perfect classification of synthetic networks, exceeding the state-of-the-art by a large margin. Applying our classification method to real-world citation networks gives credibility to the claims in the literature that models with preferential attachment, fitness and aging fit real-world citation networks best, although sometimes, the predicted model does not involve vertex fitness.
