Causal Models for Growing Networks
Gecia Bravo-Hermsdorff, Lee M. Gunderson, Kayvan Sadeghi
TL;DR
The paper addresses causal modeling of growing networks by shifting focus from node-exchangeable distributions to invariances of the causal mechanisms generating dyadic edges. It constructs a taxonomy of 96 deletion-invariant causal meta-DAGs over dyad variables and reduces them to 21 transitively closed classes, enabling distributed and asynchronous evaluation. As a canonical example, it introduces the Distributed Affine Preferential Attachment (DAPA) model, where $x_{ij}\sim\text{Bernoulli}(p_{ij})$ with $p_{ij}=\frac{\alpha+\theta_{in}d_i^{in}+\theta_{out}d_i^{out}}{j-2+\alpha+\beta}$, revealing three sparsity regimes and a flexible power-law degree distribution with exponents determined by $\theta_{in}$ and $\theta_{out}$. The framework yields natural baselines for causal inference in relational data and supports generalization, interventions, and counterfactual analyses in distributed settings, with practical implications for understanding growth, phase transitions, and network resilience in real-world systems.
Abstract
Real-world networks grow over time; statistical models based on node exchangeability are not appropriate. Instead of constraining the structure of the \textit{distribution} of edges, we propose that the relevant symmetries refer to the \textit{causal structure} between them. We first enumerate the 96 causal directed acyclic graph (DAG) models over pairs of nodes (dyad variables) in a growing network with finite ancestral sets that are invariant to node deletion. We then partition them into 21 classes with ancestral sets that are closed under node marginalization. Several of these classes are remarkably amenable to distributed and asynchronous evaluation. As an example, we highlight a simple model that exhibits flexible power-law degree distributions and emergent phase transitions in sparsity, which we characterize analytically. With few parameters and much conditional independence, our proposed framework provides natural baseline models for causal inference in relational data.
