Semiparametric causal mediation analysis of cluster-randomized trials for indirect and spillover effects
Chao Cheng, Fan Li
TL;DR
This work develops a comprehensive semiparametric framework for causal mediation analysis in cluster-randomized trials with informative cluster size and within-cluster interference. It derives efficient influence functions for cluster- and individual-level mediation estimands—$NIE$, $NDE$, $SME$, and $IME$—and proposes doubly robust estimators that integrate parametric and machine-learning nuisance models with cross-fitting and stabilization. Identification is established under standard CRT assumptions, extended to capture cross-world and interference aspects, enabling estimation of all mediation functionals via $ heta_V(a,a^*)$ and $ au_V$. Through simulations and an application to the Red de Protección Social trial, the approach demonstrates potential mediation through household dietary diversity while accommodating spillovers, though it notes cautions related to small samples and unmeasured confounding.
Abstract
In cluster-randomized trials (CRTs), there is emerging interest in exploring the causal mechanism in which a cluster-level treatment affects the outcome through an intermediate outcome. The majority of existing causal mediation methods are applicable to independent data and only a few exceptions have considered assessing causal mediation in CRTs, all of which heavily depend on parametric assumptions. In this article, we develop a formal semiparametric efficiency theory to motivate new doubly-robust methods for addressing different mediation effect estimands -- the natural indirect effect, individual mediation effect, and spillover mediation effect (the extent to which one's outcome is influenced by others' mediators). We derive the efficient influence function for each estimand, and carefully parameterize each efficient influence function to motivate practical estimators. We consider both parametric working models and data-adaptive machine learners to estimate the nuisance functions, and obtain the semiparametric efficient estimators in the latter case. We conduct simulation studies to demonstrate the finite-sample performance of our new estimators and illustrate our proposed methods by reanalyzing a real-world CRT.
