Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Steven Wilkins Reeves; Shane Lubold; Arun G. Chandrasekhar; Tyler H. McCormick

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Steven Wilkins Reeves, Shane Lubold, Arun G. Chandrasekhar, Tyler H. McCormick

Abstract

The stable unit treatment value assumption states that the outcome of an individual is not affected by the treatment statuses of others, however in many real world applications, treatments can have an effect on many others beyond the immediately treated. Interference can generically be thought of as mediated through some network structure. In many empirically relevant situations however, complete network data (required to adjust for these spillover effects) are too costly or logistically infeasible to collect. Partially or indirectly observed network data (e.g., subsamples, aggregated relational data (ARD), egocentric sampling, or respondent-driven sampling) reduce the logistical and financial burden of collecting network data, but the statistical properties of treatment effect adjustments from these design strategies are only beginning to be explored. In this paper, we present a framework for the estimation and inference of treatment effect adjustments using partial network data through the lens of structural causal models. We also illustrate procedures to assign treatments using only partial network data, with the goal of either minimizing estimator variance or optimally seeding. We derive single network asymptotic results applicable to a variety of choices for an underlying graph model. We validate our approach using simulated experiments on observed graphs with applications to information diffusion in India and Malawi.

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Abstract

Paper Structure (52 sections, 10 theorems, 79 equations, 9 figures, 2 tables, 7 algorithms)

This paper contains 52 sections, 10 theorems, 79 equations, 9 figures, 2 tables, 7 algorithms.

Introduction
Related Work
Environment
A structural causal model
Example: Contagion as a structural causal model
Examples of Exposure Maps
Examples of Partially Measured Network Data
Nonparametric Identification of Causal Effects
Inference
Estimation in Linear Models
Z estimators
Inference with partially measured data.
Asymptotic Results
Network Model Estimation
SBM Estimation with ARD
...and 37 more sections

Key Result

Theorem 3.2

Suppose that $\widehat{\beta}$ is computed as per Algorithm alg: missing data z-estimation, Assumptions assumption: Z estimator regularity conditions hold, and $s(n) = o(r(n))$. Then: Where $\mathbb{E}[\nabla_{\beta} g_n(\mathbf{Z}; \beta)|\mathbf{a}, \mathbf{X}, G^*, \theta_0] |_{\beta = \beta_0} = D(\beta_0)$

Figures (9)

Figure 1: Contagion process where a single node is seeded in time $T = 0$ in blue, and infected nodes displayed in orange at times $T = 1$ and $T = 2$.
Figure 2: Comparison of GATE estimators. ARD denotes our method using aggregated relational data. The "Full Network" method uses a regression approach with the full data available. DM is the difference in means and HT is the Horvitz-Thompson estimator.
Figure 3: Estimation of parameter $\alpha_1$ and all model parameters $\beta$ using the naive and optimized seeding. We observe that the potential gain found using a more efficient design is much greater than simply collecting complete network data.
Figure 4: Comparison of different seeding methods under complex contagion. Model-based targeting of optimal blocks generally outperforms degree seeding, especially when targeting the highest degree nodes within those blocks.
Figure 5: Equivalence of distribution of potential outcomes of nodes $i$ and $j$ are equivalent under this given treatment assignment as all of the rooted networks are equivalent.
...and 4 more figures

Theorems & Definitions (30)

Example 2.1: Local Interaction Effects
Example 2.2: Risk-Sharing Networks ambrus2014consumption
Example 2.3: Hearing Information banerjee2019gossip
Example 2.4: Induced subgraph
Example 2.5: Respondent Driven Sampling
Example 2.6: Aggregated Relational Data
Definition 2.1: Exposure Weak Ignorability
Definition 2.2: Exposure Consistency
Definition 2.3: Conditional Independence of the Graph and Outcome
Theorem 3.2: Single Network Z-estimator Asymptotics
...and 20 more

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Abstract

Model-Based Inference and Experimental Design for Interference Using Partial Network Data

Authors

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (30)