Treatment Effect Estimation with Observational Network Data using Machine Learning

Corinne Emmenegger; Meta-Lina Spohn; Timon Elmer; Peter Bühlmann

Treatment Effect Estimation with Observational Network Data using Machine Learning

Corinne Emmenegger, Meta-Lina Spohn, Timon Elmer, Peter Bühlmann

TL;DR

The paper addresses causal inference for treatment effects when units interact on a known network, where spillovers violate independence. It develops a semiparametric network AIPW estimator for the expected average treatment effect ($EATE$) under a structural equation model, using cross-fitting to accommodate flexible ML nuisance estimation and a dependency-graph–based CLT to handle network dependence. The estimator attains $\sqrt{N}$-consistency and asymptotic normality with a bootstrap-consistent variance estimator, enabling valid confidence intervals and p-values for a single network. Empirical validation includes simulations under various network topologies and an application to the Swiss StudentLife data, showing that accounting for spillovers alters estimated effects and improves inference. The approach provides a practical, theoretically justified framework for unit-level causal effects in networks, with extensions to global effects (GATE) discussed.

Abstract

Causal inference methods for treatment effect estimation usually assume independent units. However, this assumption is often questionable because units may interact, resulting in spillover effects between them. We develop augmented inverse probability weighting (AIPW) for estimation and inference of the expected average treatment effect (EATE) with observational data from a single (social) network with spillover effects. In contrast to overall effects such as the global average treatment effect (GATE), the EATE measures, in expectation and on average over all units, how the outcome of a unit is causally affected by its own treatment, marginalizing over the spillover effects from other units. We develop cross-fitting theory with plugin machine learning to obtain a semiparametric treatment effect estimator that converges at the parametric rate and asymptotically follows a Gaussian distribution. The asymptotics are developed using the dependency graph rather than the network graph, which makes explicit that we allow for spillover effects beyond immediate neighbors in the network. We apply our AIPW method to the Swiss StudentLife Study data to investigate the effect of hours spent studying on exam performance accounting for the students' social network.

Treatment Effect Estimation with Observational Network Data using Machine Learning

TL;DR

) under a structural equation model, using cross-fitting to accommodate flexible ML nuisance estimation and a dependency-graph–based CLT to handle network dependence. The estimator attains

-consistency and asymptotic normality with a bootstrap-consistent variance estimator, enabling valid confidence intervals and p-values for a single network. Empirical validation includes simulations under various network topologies and an application to the Swiss StudentLife data, showing that accounting for spillovers alters estimated effects and improves inference. The approach provides a practical, theoretically justified framework for unit-level causal effects in networks, with extensions to global effects (GATE) discussed.

Abstract

Paper Structure (21 sections, 16 theorems, 98 equations, 7 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 16 theorems, 98 equations, 7 figures, 1 table, 1 algorithm.

Introduction
Our Contribution and Comparison to Literature
Framework and our Network AIPW Estimator
Model Formulation
Treatment Effect and Identification
Dependency Graph
Estimation Procedure and Asymptotics
Bootstrap Variance Estimator
Empirical Validation
Simulation Study
Empirical Analysis: Swiss StudentLife Study Data
Conclusion
Assumptions and Additional Definitions
Network Effects in the Social Sciences
Additional Simulation Results
...and 6 more sections

Key Result

Lemma 2.2

Let $i\in[N]$. Let be the concatenation of the observed variables for unit $i$. For concatenations $\eta = (g_1, g_0, h)$ of general nuisance functions $g_1$, $g_0$, and $h$, consider the score including the above-mentioned correction term. For the true nuisance functions $\eta^0=(g_1^0, g_0^0, h^0)$, we have $\mathop{\mathrm{\mathbb{E}}}\nolimits[ \varphi(S_i, \eta^0) ] = \theta_i^0$ and can co

Figures (7)

Figure 1: A network on nine units where the node label represents the number of a unit. Gray nodes receive the treatment, corresponding to $W_i = 1$, and white ones do not, corresponding to $W_i = 0$.
Figure 2: A network $G$ on four units (left), where the spillover effects come from the treatments of the direct neighbors, which results in a distance-two dependence, which is displayed in the corresponding dependency graph $G_D$ (middle). The underlying causal DAG is displayed on the right, where arrows due to $X$-spillover effects are gray.
Figure 3: Different network structures on $N = 200$ units: Erdős--Rényi network (left) where two nodes are connected with probability $3/N$ (every node is connected to $3$ other nodes in expectation); Watts--Strogatz network (right) with a rewiring probability of $0.05$, a $1$-dimensional ring-shaped starting lattice where each node is connected to $2$ neighbors on both sides (that is, every node is connected to 4 other nodes), no loops, and no multiple edges. The graphs are generated using the R-package igraphigraph.
Figure 4: Coverage (fraction of times the true, and in general unknown, $\theta_{N}^0$ was inside the confidence interval), log mean length of two-sided $95\%$ confidence intervals for $\theta_{N}^0$, and mean bias over $1000$ simulation runs for Erdős--Rényi and Watts--Strogatz networks of different complexities (Erdős--Rényi: expected degree $3$ and $3N^{1/15}$ for "const" and "$N {\mathchar"5E} (1 / 15)$", respectively; Watts--Strogatz: before rewiring, nodes have degree $4$ and $4N^{1/15}$ for "const" and "$N {\mathchar"5E} (1 / 15)$", respectively, and the rewiring probability is $0.05$). We compare the performance of our method, netAIPW, with the Hájek and an IPW estimator, indicated by color. The variance of the competitors are empirical variances over the $1000$ repetitions, whereas we computed confidence intervals for netAIPW according to \ref{['eq:CI']} with $B=1$ and $300$ bootstrap samples. The shaded regions in the coverage plot represent $95\%$ confidence bands with respect to the $1000$ simulation runs.
Figure 5: Friendship networks per cohort with black dots representing $W_i=1$ and a weekly study time of at least $8$ hours, white for $W_i=0$ and a weekly study time of less than $8$ hours, and a bigger node size represents a higher GPA.
...and 2 more figures

Theorems & Definitions (34)

Example 2.1
Lemma 2.2
Definition 2.3
Example 2.4
Theorem 2.5: Asymptotic distribution of $\hat{\theta}$
Theorem 2.6
Lemma D.1
proof : Proof of Lemma \ref{['lem:lemma1']}
Lemma D.2
proof : Proof of Lemma \ref{['lem:lemma2']}
...and 24 more

Treatment Effect Estimation with Observational Network Data using Machine Learning

TL;DR

Abstract

Treatment Effect Estimation with Observational Network Data using Machine Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (34)