Table of Contents
Fetching ...

Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations

Edward McFowland, Cosma Rohilla Shalizi

TL;DR

This work tackles the challenge of estimating social-influence effects from observational network data by addressing latent homophily through latent-location controls. It establishes conditions under two standard network-generating families—the latent community (SBM) model with GMZZ conditions and the continuous latent-space model with Asta conditions—that yield asymptotically unbiased estimates of the social-influence coefficient $\beta$ when latent locations are consistently inferred. In the SBM case, estimation bias decays exponentially with network size, while in continuous latent space, bias decays polynomially given concentration of the latent-location estimates. The authors corroborate theory with simulations comparing oracle, algorithmic, and naive estimators, showing that leveraging latent locations via an OLS regression in the effective model delivers consistent inference and robust performance under reasonable deviations from assumptions.

Abstract

Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. We show that, for common versions of those two network models, these estimates are so informative that controlling for estimated attributes allows for asymptotically unbiased and consistent estimation of social-influence effects in linear models. In particular, the bias shrinks at a rate which directly reflects how much information the network provides about the latent attributes. These are the first results on the consistent non-experimental estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them.

Estimating Causal Peer Influence in Homophilous Social Networks by Inferring Latent Locations

TL;DR

This work tackles the challenge of estimating social-influence effects from observational network data by addressing latent homophily through latent-location controls. It establishes conditions under two standard network-generating families—the latent community (SBM) model with GMZZ conditions and the continuous latent-space model with Asta conditions—that yield asymptotically unbiased estimates of the social-influence coefficient when latent locations are consistently inferred. In the SBM case, estimation bias decays exponentially with network size, while in continuous latent space, bias decays polynomially given concentration of the latent-location estimates. The authors corroborate theory with simulations comparing oracle, algorithmic, and naive estimators, showing that leveraging latent locations via an OLS regression in the effective model delivers consistent inference and robust performance under reasonable deviations from assumptions.

Abstract

Social influence cannot be identified from purely observational data on social networks, because such influence is generically confounded with latent homophily, i.e., with a node's network partners being informative about the node's attributes and therefore its behavior. If the network grows according to either a latent community (stochastic block) model, or a continuous latent space model, then latent homophilous attributes can be consistently estimated from the global pattern of social ties. We show that, for common versions of those two network models, these estimates are so informative that controlling for estimated attributes allows for asymptotically unbiased and consistent estimation of social-influence effects in linear models. In particular, the bias shrinks at a rate which directly reflects how much information the network provides about the latent attributes. These are the first results on the consistent non-experimental estimation of social-influence effects in the presence of latent homophily, and we discuss the prospects for generalizing them.

Paper Structure

This paper contains 14 sections, 13 theorems, 53 equations, 4 figures.

Key Result

Lemma 1

Under the assumptions from Section sec:setting, if $\mathrm{Pr}\left( C \neq \hat{C} \right) = 0$, then the ordinary least squares estimate of $\beta$ in eqn:effective-model is unbiased and consistent.

Figures (4)

  • Figure 1: The graphical causal model for our setting. Boxes indicate observables, and circles latent variables; solid lines indicate causal relations between observables (either autoregressive or peer-influence), while dotted lines indicate the influence of latent homophilous variables, and dashed lines indicate the influence of other covariates. For simplicity, we omit $Y(j, t+1)$ and $Y(l, t+1)$, as well as their associated arrows.
  • Figure 2: Comparison of the expected properties of the estimators where expectations is computed over $50$ random samples, allowing also for the formation of $95$% confidence intervals. The parameter of interest is sample size $n$, which varies, while the latent community model parameters remain fixed ($k = 4, p_{\text{within}} = 0.75, p_{\text{between}} = 0.25$).
  • Figure 3: Comparison of the expected properties of the estimators where expectations is computed over $50$ random samples, allowing also for the formation of $95$% confidence intervals. The parameter of interest is $p_{\text{between}}$, which varies, while the sample size and other latent community model parameters remain fixed ($n = 500, k = 4, p_{\text{within}} = 0.75$).
  • Figure 4: Comparison of the expected properties of the estimators where expectations is computed over $50$ random samples, allowing also for the formation of $95$% confidence intervals. The parameter of interest is sample size $p_{\text{within}}$, which varies, while the sample size and other latent community model parameters remain fixed ($n = 500, k = 4, p_{\text{between}} = 0.25$).

Theorems & Definitions (20)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Theorem 2
  • Lemma 4
  • proof
  • Lemma 4
  • proof
  • ...and 10 more