Table of Contents
Fetching ...

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Steven Wilkins Reeves, Shane Lubold, Arun G. Chandrasekhar, Tyler H. McCormick

Abstract

The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Abstract

The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.
Paper Structure (52 sections, 10 theorems, 79 equations, 9 figures, 2 tables, 7 algorithms)

This paper contains 52 sections, 10 theorems, 79 equations, 9 figures, 2 tables, 7 algorithms.

Key Result

Theorem 3.2

Suppose that $\widehat{\beta}$ is computed as per Algorithm alg: missing data z-estimation, Assumptions assumption: Z estimator regularity conditions hold, and $s(n) = o(r(n))$. Then: Where $\mathbb{E}[\nabla_{\beta} g_n(\mathbf{Z}; \beta)|\mathbf{a}, \mathbf{X}, G^*, \theta_0] |_{\beta = \beta_0} = D(\beta_0)$

Figures (9)

  • Figure 1: Contagion process where a single node is seeded in time $T = 0$ in blue, and infected nodes displayed in orange at times $T = 1$ and $T = 2$.
  • Figure 2: Comparison of GATE estimators. ARD denotes our method using aggregated relational data. The "Full Network" method uses a regression approach with the full data available. DM is the difference in means and HT is the Horvitz-Thompson estimator.
  • Figure 3: Estimation of parameter $\alpha_1$ and all model parameters $\beta$ using the naive and optimized seeding. We observe that the potential gain found using a more efficient design is much greater than simply collecting complete network data.
  • Figure 4: Comparison of different seeding methods under complex contagion. Model-based targeting of optimal blocks generally outperforms degree seeding, especially when targeting the highest degree nodes within those blocks.
  • Figure 5: Equivalence of distribution of potential outcomes of nodes $i$ and $j$ are equivalent under this given treatment assignment as all of the rooted networks are equivalent.
  • ...and 4 more figures

Theorems & Definitions (30)

  • Example 2.1: Local Interaction Effects
  • Example 2.2: Risk-Sharing Networks ambrus2014consumption
  • Example 2.3: Hearing Information banerjee2019gossip
  • Example 2.4: Induced subgraph
  • Example 2.5: Respondent Driven Sampling
  • Example 2.6: Aggregated Relational Data
  • Definition 2.1: Exposure Weak Ignorability
  • Definition 2.2: Exposure Consistency
  • Definition 2.3: Conditional Independence of the Graph and Outcome
  • Theorem 3.2: Single Network Z-estimator Asymptotics
  • ...and 20 more