Table of Contents
Fetching ...

A Network Formation Model Based on Subgraphs

Arun G. Chandrasekhar, Matthew O. Jackson

TL;DR

It is shown that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): stochastic block models; models with node-level unobserved heterogeneity; latent space models; and exponential random graphs.

Abstract

We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimators in empirically relevant cases. We show that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): (i) stochastic block models; (ii) models with node-level unobserved heterogeneity; (iii) latent space models; (iv) exponential random graphs. We illustrate the framework's value via several applications using networks from rural India. We study whether network structure helps enforce risk-sharing and whether cross-caste interactions are more likely to be private. We also develop a new central limit theorem for correlated random variables, which is required to prove our results and is of independent interest.

A Network Formation Model Based on Subgraphs

TL;DR

It is shown that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): stochastic block models; models with node-level unobserved heterogeneity; latent space models; and exponential random graphs.

Abstract

We develop a new class of random graph models for the statistical estimation of network formation -- subgraph generated models (SUGMs). Various subgraphs -- e.g., links, triangles, cliques, stars -- are generated and their union results in a network. We show that SUGMs are identified and establish the consistency and asymptotic distribution of parameter estimators in empirically relevant cases. We show that a simple four-parameter SUGM matches basic patterns in empirical networks more closely than four standard models (with many more dimensions): (i) stochastic block models; (ii) models with node-level unobserved heterogeneity; (iii) latent space models; (iv) exponential random graphs. We illustrate the framework's value via several applications using networks from rural India. We study whether network structure helps enforce risk-sharing and whether cross-caste interactions are more likely to be private. We also develop a new central limit theorem for correlated random variables, which is required to prove our results and is of independent interest.

Paper Structure

This paper contains 23 sections, 22 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Examples of subgraphs. Links could be directed or undirected or even multiplexed (take on multiple edge types) and nodes can have different characteristic combinations (denoted by node colors and labels).
  • Figure 2: Panel (A) shows all possible links and Panel (B) shows all possible triangles when a node has characteristic $X_i\in \{red,\ blue\}$.
  • Figure 3: The network that is formed and eventually observed is shown in panel D. The process comes from forming triangles with probability $\beta_{T}$ as in (B) in red; and forming links, in grey, with probability $\beta_{L}$ as in (C)---all independently. New links are dashed while links that overlap with some link also formed in a triangle are in solid and bold. We see that there is both (i) overlap as some links coincide with links already in triangles, as well as (ii) extra triangles that were generated "incidentally." Given that we only observe the resulting network in panel D, we need to infer the formation of the different subgraphs carefully and not simply by directly counting observed links and triangles.
  • Figure 4: Two different configurations of two triangles; one has a count of 6 total links and the other has a count of 5 links. (A) is more relatively more likely to come directly from the formation of two triangles, and (B) is relatively more likely to come from a combination of links and triangles. The likelihoods of links and triangles can thus be deduced via careful deductions from the combination of the counts of links and triangles.
  • Figure 5: A network is formed on 41 nodes and is shown in panel D. The process can be thought of as first forming triangles as in (B), and links as in (C). Note that two links form on triangles, and a third link incidentally generates an extra triangle. In this network we would count $\widetilde{S}^n_T(g)=10$, and $\widetilde{S}^n_L(g)=22$ from (D), while the true process generated 9 triangles and 23 links directly. The estimated parameters are $\widehat{\beta}^{DC}_{n,T}=\frac{10}{10660}$, and $\widehat{\beta}^{DC}_{n,L}=\frac{22}{820}$, while the true frequencies were $\frac{9}{10660}$ and $\frac{25}{820}$.