Learning the mechanisms of network growth

Lourens Touwen; Doina Bucur; Remco van der Hofstad; Alessandro Garavaglia; Nelly Litvak

Learning the mechanisms of network growth

Lourens Touwen, Doina Bucur, Remco van der Hofstad, Alessandro Garavaglia, Nelly Litvak

TL;DR

This work tackles the problem of identifying the underlying growth mechanisms of dynamic networks by reframing model selection as a classification task. It introduces a collapsed continuous-time branching process (CTBP) framework to generate nine growth models from fitness, aging, and preferential attachment mechanisms, and constructs a novel Dynamic Feature Matrix $D$ that captures temporal edge accrual across vertex cohorts. A large synthetic dataset is used to train classifiers on static, dynamic, and combined features, with dynamic features achieving near-perfect accuracy on synthetic networks (≈98%), substantially surpassing static features (≈93%). Applying the method to Web of Science citation networks yields insights consistent with the literature—favoring models that incorporate fitness, aging, and/or preferential attachment—while underscoring the sensitivity of conclusions to feature design. The paper highlights both the potential of feature-informed ML for model selection in dynamics and the need for careful validation and extension to other models and domains.

Abstract

We propose a novel model-selection method for dynamic networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generated by simulating nine state-of-the-art random graph models for dynamic networks, with parameter range chosen to ensure exponential growth of the network size in time. We design a conceptually novel type of dynamic features that count new links received by a group of vertices in a particular time interval. The proposed features are easy to compute, analytically tractable, and interpretable. Our approach achieves a near-perfect classification of synthetic networks, exceeding the state-of-the-art by a large margin. Applying our classification method to real-world citation networks gives credibility to the claims in the literature that models with preferential attachment, fitness and aging fit real-world citation networks best, although sometimes, the predicted model does not involve vertex fitness.

Learning the mechanisms of network growth

TL;DR

that captures temporal edge accrual across vertex cohorts. A large synthetic dataset is used to train classifiers on static, dynamic, and combined features, with dynamic features achieving near-perfect accuracy on synthetic networks (≈98%), substantially surpassing static features (≈93%). Applying the method to Web of Science citation networks yields insights consistent with the literature—favoring models that incorporate fitness, aging, and/or preferential attachment—while underscoring the sensitivity of conclusions to feature design. The paper highlights both the potential of feature-informed ML for model selection in dynamics and the need for careful validation and extension to other models and domains.

Abstract

Paper Structure (54 sections, 12 equations, 15 figures, 9 tables)

This paper contains 54 sections, 12 equations, 15 figures, 9 tables.

Introduction
Model selection for complex networks
Dynamic network mechanisms
Modeling networks in calendar time
Results
Dynamic network models
Classification of dynamic networks
Generating the training data.
Static features.
Dynamic features.
Near-perfect classification of synthetic dynamic networks
Classification results with static features.
Classification results with dynamic features.
Classification results with static and dynamic features.
Classification applied to citation networks
...and 39 more sections

Figures (15)

Figure 1: An example of the eight combinations of growth mechanisms. F stands for fitness, A for aging, and P for Preferential Attachment. U stands for uniform attachment, when a new vertex connects to existing ones uniformly at random. The circles are the vertices, arranged horizontally from left to right in the order of their arrival. The vertical positioning is chosen to make all directed edges visible, it has no further meaning. In the models with fitness, the size of the circles is proportional to their fitness. The red circle on the right is the new vertex that will connect to one of the existing vertices. The color of the other vertices from dark blue to yellow corresponds to the connection probability, from low to high, of the red vertex to an older vertex.
Figure 2: Confusion matrix using: (a) static features, (b) dynamic features with time cohorts, (c) both static features and dynamic features with time cohorts.
Figure 3: The values of the dynamic features with time-cohorts for the nine models. The shape of the matrix is similar for all models but the values reduce with their distance to $D_{10,1}$ in different ways.
Figure 4: The relative difference, $\delta_{ij}=(D_{ij}-\bar{D}_{ij})/\bar{D}_{ij}$ with time-cohorts. The sum of elements $i,j$ over all classes equals zero. The value $-1$ appears often because it corresponds to $D_{ij}=0$.
Figure A.5: Number of publications per year in the fields (logarithmic vertical axis).
...and 10 more figures

Learning the mechanisms of network growth

TL;DR

Abstract

Learning the mechanisms of network growth

Authors

TL;DR

Abstract

Table of Contents

Figures (15)