Table of Contents
Fetching ...

Adaptive Experiment Design with Synthetic Controls

Alihan Hüyük, Zhaozhi Qian, Mihaela van der Schaar

TL;DR

The paper tackles heterogeneous treatment effects by proposing Syntax, an adaptive exploratory trial design that uses online synthetic controls to identify subpopulations with positive effects within a fixed sample budget. It introduces a synthetic-control estimator $\hat{r}_i(\bm{\beta}) = \hat{y}_{iT}^{(1)} - \bm{\beta}^\top \hat{\bm{y}}_{\cdot T}^{(0)}$ with constraints $\bm{x}_i = X\bm{\beta}$, $\bm{z}_i \approx Z\bm{\beta}$, and $\mathbf{1}^\top\bm{\beta}=1$, proving unbiasedness and a variance bound that includes a representation-error term $\lambda\|\bm{\beta}-\bm{1}_i\|_{N^{-1}}^2$. A variance-minimizing selection of $\bm{\beta}_i^*$ feeds into an online algorithm, Syntax, which adaptively recruits subpopulations by minimizing a sensitivity index $S_i = |\hat{r}_i(\bm{\beta}_i^*)|/\sqrt{V_i(\bm{\beta}_i^*)}$ and ultimately reports $\hat{\mathcal{I}}^* = \{i: \hat{r}_i(\bm{\beta}_i^*)>0\}$. The experiments show Syntax outperforms benchmarks in environments with diminishing factor effects, achieving better FPR/TPR with substantially fewer samples, and discuss practical implications such as sample savings and allocation efficiency. The work highlights when synthetic controls are most beneficial—namely, when pre-treatment factors strongly inform latent loadings and post-treatment factors are weaker—and frames a path toward faster, more targeted clinical evaluation of heterogeneous treatment effects.

Abstract

Clinical trials are typically run in order to understand the effects of a new treatment on a given population of patients. However, patients in large populations rarely respond the same way to the same treatment. This heterogeneity in patient responses necessitates trials that investigate effects on multiple subpopulations - especially when a treatment has marginal or no benefit for the overall population but might have significant benefit for a particular subpopulation. Motivated by this need, we propose Syntax, an exploratory trial design that identifies subpopulations with positive treatment effect among many subpopulations. Syntax is sample efficient as it (i) recruits and allocates patients adaptively and (ii) estimates treatment effects by forming synthetic controls for each subpopulation that combines control samples from other subpopulations. We validate the performance of Syntax and provide insights into when it might have an advantage over conventional trial designs through experiments.

Adaptive Experiment Design with Synthetic Controls

TL;DR

The paper tackles heterogeneous treatment effects by proposing Syntax, an adaptive exploratory trial design that uses online synthetic controls to identify subpopulations with positive effects within a fixed sample budget. It introduces a synthetic-control estimator with constraints , , and , proving unbiasedness and a variance bound that includes a representation-error term . A variance-minimizing selection of feeds into an online algorithm, Syntax, which adaptively recruits subpopulations by minimizing a sensitivity index and ultimately reports . The experiments show Syntax outperforms benchmarks in environments with diminishing factor effects, achieving better FPR/TPR with substantially fewer samples, and discuss practical implications such as sample savings and allocation efficiency. The work highlights when synthetic controls are most beneficial—namely, when pre-treatment factors strongly inform latent loadings and post-treatment factors are weaker—and frames a path toward faster, more targeted clinical evaluation of heterogeneous treatment effects.

Abstract

Clinical trials are typically run in order to understand the effects of a new treatment on a given population of patients. However, patients in large populations rarely respond the same way to the same treatment. This heterogeneity in patient responses necessitates trials that investigate effects on multiple subpopulations - especially when a treatment has marginal or no benefit for the overall population but might have significant benefit for a particular subpopulation. Motivated by this need, we propose Syntax, an exploratory trial design that identifies subpopulations with positive treatment effect among many subpopulations. Syntax is sample efficient as it (i) recruits and allocates patients adaptively and (ii) estimates treatment effects by forming synthetic controls for each subpopulation that combines control samples from other subpopulations. We validate the performance of Syntax and provide insights into when it might have an advantage over conventional trial designs through experiments.
Paper Structure (29 sections, 2 theorems, 18 equations, 4 figures, 4 tables, 2 algorithms)

This paper contains 29 sections, 2 theorems, 18 equations, 4 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

Assuming $M_{\neg T}=[\bm{\mu}_1\cdots\bm{\mu}_{T-1}]$ has full rank and $T>D_z$, we have $\mathbb{E}[r_i-\hat{r}_i(\bm{\beta})] = 0$ and for $\lambda = \|M_{\neg T}^{\mathsf{T}}(M_{\neg T}M_{\neg T}^{\mathsf{T}})^{-1}\bm{\mu}_T\|^2$ when $\bm{\beta}$ is such that $\bm{x}_i=X\bm{\beta}$, $\bm{\hat{y}}_{i\neg T}=\hat{Y}_{\neg T}\bm{\beta}$, and $\bm{1}^{\mathsf{T}}\bm{\beta}=1$.

Figures (4)

  • Figure 1: Tradeoff between individual benefit and cost. Consider two clinical trials that are both designed to confirm the effectiveness of a new treatment for one or multiple subpopulations. While Trial A investigates only two candidate subpopulations, Trial B investigates eight. As a result, Trial B has the potential to succeed for two subpopulations (SP2 & SP6) while Trial A is likely to fail for all. However, Trial B needs to allocate fewer samples to each subpopulation, which makes confirming positive effects more challenging. We propose Syntax as an exploratory pilot study that finds good subpopulations to target (such as SP2 & SP6) ahead of a confirmatory trial.
  • Figure 2: Comparison of (a) synthetic algorithms and (b) adaptive algorithms. Switching to a pre-planned sampling strategy from an adaptive one or switching to a naive inference strategy from a synthetic one both cause FPR to increase and TPR to decrease at comparable scales.
  • Figure 3: Proportion of samples allocated to the treatment group over the control group. By sharing information between the control samples of different subpopulations, Syntax is able allocate more of its samples to the treatment group compared with alternative designs.
  • Figure 4: Sensitivity of Syntax to parameter $\bm{\lambda}$.

Theorems & Definitions (3)

  • Proposition 1
  • proof
  • Proposition 2