Causal clustering: design of cluster experiments under network interference

Davide Viviano; Lihua Lei; Guido Imbens; Brian Karrer; Okke Schrijvers; Liang Shi

Causal clustering: design of cluster experiments under network interference

Davide Viviano, Lihua Lei, Guido Imbens, Brian Karrer, Okke Schrijvers, Liang Shi

TL;DR

The paper addresses estimating the global average treatment effect under network interference, where spillovers complicate cluster design. It introduces Causal Clustering, an algorithm that minimizes a worst-case MSE by solving penalized min-cut problems via SDP relaxations, yielding a Pareto frontier between bias and variance. The authors derive closed-form worst-case bias and variance expressions, establish a practical Bernoulli-vs-cluster design rule, and validate the method with Facebook network data and field data, showing how clustering choices and the number of clusters affect inference. The approach provides a principled, scalable guide for cluster design in network settings, applicable to online experiments and field trials.

Abstract

This paper studies the design of cluster experiments to estimate the global treatment effect in the presence of network spillovers. We provide a framework to choose the clustering that minimizes the worst-case mean-squared error of the estimated global effect. We show that optimal clustering solves a novel penalized min-cut optimization problem computed via off-the-shelf semi-definite programming algorithms. Our analysis also characterizes simple conditions to choose between any two cluster designs, including choosing between a cluster or individual-level randomization. We illustrate the method's properties using unique network data from the universe of Facebook's users and existing data from a field experiment.

Causal clustering: design of cluster experiments under network interference

TL;DR

Abstract

Paper Structure (82 sections, 20 theorems, 220 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 82 sections, 20 theorems, 220 equations, 4 figures, 4 tables, 2 algorithms.

Introduction
Setup
Potential outcomes and spillover effects
Experimental design and estimation
Bias, Variance and MSE of a given clustering
Worst-case bias, variance, and MSE
Comparison between a given clustering and a Bernoulli design
Optimization over the choice of the clustering
Empirical illustration and numerical studies
Comparisons between existing clusterings at Faceboook
Clustering in the field
Main results
Extensions
Experimental design for bias-aware inference
Additional extensions
...and 67 more sections

Key Result

Lemma 3.1

Let Assumptions ass:first_order, ass:exposure_restriction, ass:clusters hold. Then $\sup_{\mu \in \mathcal{M} } |\tau_{n, \mu} - \mathbb{E}_\mu[\hat{\tau}_n(\mathcal{C}_{ n})]| = \bar{\phi}_n b_n(\mathcal{C}_n).$

Figures (4)

Figure 1: Clusters comparisons for Louvain clustering. Different Types correspond to different numbers of clusters (with Type 1 having the largest number of clusters). Different panels correspond to different graphs where two individuals are not connected if the connection (measured with a continuous variable) is below the $5^{th}, 10^{th}, 50^{th}$ percentile (dense, moderate, and sparse graph). The two graphs in the panels are Facebook friendship and Facebook messaging.
Figure 2: Example of the number of clusters as a function of $\xi$ (left-panel, with dotted line corresponding to 3.2) and objective function in Theorem \ref{['thm:opt1']} for different clustering. Algorithm \ref{['alg:1']} corresponds to causal clustering. Data from cai2015social, where we report the average result across $47$ regions in the dataset. Here log indicates the natural logarithmic function.
Figure 3: Mean-squared error in calibrated simulations (in natural logs) through comparisons of $M(\lambda)$ in Equation \ref{['eqn:M_lambda']} using a surrogate objective with $\xi = \lambda^{-1}$. Simulations are calibrated the model to data from cai2015social, averaged over 47 regions. The first three plots vary the variance of the residuals in the outcome model $\sigma^2 \in \{1/4, 1/2, 1\}$, and calibrating the remaining parameters to the model in cai2015social, Table 2, Column 4, where outcomes are functions of neighbors' treatments. The last plot calibrate the simulations to settings where the outcomes are functions of the neighbors' outcome violating Assumption \ref{['ass:first_order']}, and $\sigma^2 = 1/4$.
Figure 4: Objective function in Theorem \ref{['thm:opt1']} (in log scale) as a function of $\xi$ over $100$ replications for different network formation models. $N$ denotes the size of the network.

Theorems & Definitions (52)

Example 2.1: Linear exogenous peer effects
Remark 1: Direct and overall effect
Remark 2: Higher order interference and endogenous peer effects
Remark 3: Alternative estimators
Remark 4: Covariate adjustment
Remark 5: Saturation designs
Lemma 3.1: Worst-case bias
proof
Lemma 3.2: Worst-case variance
proof
...and 42 more

Causal clustering: design of cluster experiments under network interference

TL;DR

Abstract

Causal clustering: design of cluster experiments under network interference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (52)