Low-order outcomes and clustered designs: combining design and analysis for causal inference under network interference
Matthew Eichhorn, Samir Khan, Johan Ugander, Christina Lee Yu
TL;DR
The paper addresses causal inference under network interference by integrating low-order β-order outcome models with graph-cluster randomized designs. It introduces a generalized pseudoinverse estimator for the total treatment effect that remains effective under arbitrary designs, and provides precise bias and variance bounds, including specialized results for Bernoulli graph cluster randomized designs. The results show that jointly optimizing over the estimator and the design yields variance reductions beyond what either approach achieves alone, and guide practical clustering choices. Empirical evidence demonstrates the bounds’ usefulness for selecting clusterings across diverse graphs and response models, with Monte Carlo methods enabling application to complex designs. Overall, the framework offers a scalable, design-aware pathway to robust causal inference in interference settings and suggests directions for further theoretical and methodological development.
Abstract
Variance reduction for causal inference in the presence of network interference is often achieved through either outcome modeling, typically analyzed under unit-randomized Bernoulli designs, or clustered experimental designs, typically analyzed without strong parametric assumptions. In this work, we study the intersection of these two approaches and make the following threefold contributions. First, we present an estimator of the total treatment effect (or global average treatment effect) in low-order outcome models when the data are collected under general experimental designs, generalizing previous results for Bernoulli designs. We refer to this estimator as the pseudoinverse estimator and give bounds on its bias and variance in terms of properties of the experimental design. Second, we evaluate these bounds for the case of Bernoulli graph cluster randomized (GCR) designs. Its variance scales like the smaller of the variance obtained by the estimator derived under a low-order assumption, and the variance obtained from cluster randomization, showing that combining these variance reduction strategies is preferable to using either individually. When the order of the potential outcomes model is correctly specified, our estimator is always unbiased, and under a misspecified model, we upper bound the bias by the closeness of the ground truth model to a low-order model. Third, we give empirical evidence that our variance bounds can be used to select a good clustering that minimizes the worst-case variance under a cluster randomized design from a set of candidate clusterings. Across a range of graphs and clustering algorithms, our method consistently selects clusterings that perform well on a range of response models, suggesting the practical use of our bounds.
