Synthetic Combinations: A Causal Inference Framework for Combinatorial Interventions
Abhineet Agarwal, Anish Agarwal, Suhas Vijaykumar
TL;DR
This work introduces Synthetic Combinations, a causal-inference framework for combinatorial interventions that jointly leverages across-unit low-rank structure and across-combination sparsity in a Fourier representation. Potentials are modeled as $Y_n^{(oldsymbol{ au})} = \langle \boldsymbol{\alpha}_n, \boldsymbol{\chi}^{\boldsymbol{\tau}} \rangle + \epsilon_n^{\boldsymbol{\tau}}$, with the Fourier matrix $\mathcal{A}$ of rank $r$ and unit-specific $s$-sparse coefficients, enabling identification under unobserved confounding via a donor-set mechanism. The two-step Synthetic Combinations estimator uses horizontal regression (Lasso or CART) to learn donor outcomes and vertical regression (PCR) to transport these estimates to non-donor units, with finite-sample consistency and asymptotic normality established under precise sampling and incoherence conditions. An experimental-design mechanism guarantees the key assumptions hold with high probability, yielding a data-efficient path to estimating all $N \times 2^p$ potential outcomes and enabling ranking-based extensions. Empirically, the method outperforms baselines on movie-rating data and synthetic simulations, highlighting its potential for factorial designs, recommender systems, and basket therapies where many interventions interact in diverse ways.
Abstract
Consider a setting where there are $N$ heterogeneous units and $p$ interventions. Our goal is to learn unit-specific potential outcomes for any combination of these $p$ interventions, i.e., $N \times 2^p$ causal parameters. Choosing a combination of interventions is a problem that naturally arises in a variety of applications such as factorial design experiments, recommendation engines, combination therapies in medicine, conjoint analysis, etc. Running $N \times 2^p$ experiments to estimate the various parameters is likely expensive and/or infeasible as $N$ and $p$ grow. Further, with observational data there is likely confounding, i.e., whether or not a unit is seen under a combination is correlated with its potential outcome under that combination. To address these challenges, we propose a novel latent factor model that imposes structure across units (i.e., the matrix of potential outcomes is approximately rank $r$), and combinations of interventions (i.e., the coefficients in the Fourier expansion of the potential outcomes is approximately $s$ sparse). We establish identification for all $N \times 2^p$ parameters despite unobserved confounding. We propose an estimation procedure, Synthetic Combinations, and establish it is finite-sample consistent and asymptotically normal under precise conditions on the observation pattern. Our results imply consistent estimation given $\text{poly}(r) \times \left( N + s^2p\right)$ observations, while previous methods have sample complexity scaling as $\min(N \times s^2p, \ \ \text{poly(r)} \times (N + 2^p))$. We use Synthetic Combinations to propose a data-efficient experimental design. Empirically, Synthetic Combinations outperforms competing approaches on a real-world dataset on movie recommendations. Lastly, we extend our analysis to do causal inference where the intervention is a permutation over $p$ items (e.g., rankings).
