ClusterSC: Advancing Synthetic Control with Donor Selection
Saeyoung Rho, Andrew Tang, Noah Bergam, Rachel Cummings, Vishal Misra
TL;DR
This work extends synthetic control (SC) to disaggregate-level data by addressing the curse of dimensionality from large donor pools. It introduces ClusterSC, a two-stage method that clusters donors using right-singular-vector embeddings and applies SC only to the most relevant cluster, with HSVT denoising integrated into the Learn step. The authors prove theoretical guarantees for accurate subgroup identification and show improved pre- and post-intervention prediction bounds under Gaussian, sub-Gaussian, and heavy-tailed noise, complemented by empirical results on synthetic and real housing data. The approach yields higher prediction accuracy and stability, illustrating significant practical benefits for individual-level causal inference and policy evaluation.
Abstract
In causal inference with observational studies, synthetic control (SC) has emerged as a prominent tool. SC has traditionally been applied to aggregate-level datasets, but more recent work has extended its use to individual-level data. As they contain a greater number of observed units, this shift introduces the curse of dimensionality to SC. To address this, we propose Cluster Synthetic Control (ClusterSC), based on the idea that groups of individuals may exist where behavior aligns internally but diverges between groups. ClusterSC incorporates a clustering step to select only the relevant donors for the target. We provide theoretical guarantees on the improvements induced by ClusterSC, supported by empirical demonstrations on synthetic and real-world datasets. The results indicate that ClusterSC consistently outperforms classical SC approaches.
