Table of Contents
Fetching ...

Inference for Synthetic Controls via Refined Placebo Tests

Lihua Lei, Timothy Sudijono

TL;DR

The paper tackles inference in synthetic controls with a single treated unit and small donor pools, where traditional asymptotic methods can falter and existing placebo tests suffer from coarse resolution. It introduces the Leave-Two-Out (LTO) placebo test, a non-randomized approximate p-value built by excluding two units, which preserves finite-sample Type-I control under uniform assignment while yielding finer p-value granularity and improved power for larger effects. The approach extends to non-uniform assignments via a weighted LTO and includes a sensitivity-analysis framework, with additional extensions such as rank-sum and leave-$r$-out variants. Empirical results on semisynthetic California Prop 99, Basque terrorism, and German reunification demonstrate that LTO often achieves lower unconditional Type-I error and higher power in meaningful regimes, offering a practical alternative to exact and approximate placebo tests for small-sample SCM applications. Overall, Leave-Two-Out constitutes a new form of randomization inference tailored to few-cluster settings and broadens the toolkit for robust SCM inference.

Abstract

The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile and (2) simplification of the estimation procedure that is implemented in practice. An alternative is permutation inference, which is related to a common diagnostic called the placebo test. It has provable Type-I error guarantees in finite samples without simplification of the method, when the treatment is uniformly assigned. Despite this robustness, the placebo test suffers from low resolution since the null distribution is constructed from only $N$ reference estimates, where $N$ is the sample size. This creates a barrier for statistical inference at a common level like $α= 0.05$, especially when $N$ is small. We propose a novel leave-two-out procedure that bypasses this issue, while still maintaining the same finite-sample Type-I error guarantee under uniform assignment for a wide range of $N$. Unlike the placebo test whose Type-I error always equals the theoretical upper bound, our procedure often achieves a lower unconditional Type-I error than theory suggests; this enables useful inference in the challenging regime when $α< 1/N$. Empirically, our procedure achieves a higher power when the effect size is reasonably large and a comparable power otherwise. We generalize our procedure to non-uniform assignments and show how to conduct sensitivity analysis. From a methodological perspective, our procedure can be viewed as a new type of randomization inference different from permutation or rank-based inference, which is particularly effective in small samples.

Inference for Synthetic Controls via Refined Placebo Tests

TL;DR

The paper tackles inference in synthetic controls with a single treated unit and small donor pools, where traditional asymptotic methods can falter and existing placebo tests suffer from coarse resolution. It introduces the Leave-Two-Out (LTO) placebo test, a non-randomized approximate p-value built by excluding two units, which preserves finite-sample Type-I control under uniform assignment while yielding finer p-value granularity and improved power for larger effects. The approach extends to non-uniform assignments via a weighted LTO and includes a sensitivity-analysis framework, with additional extensions such as rank-sum and leave--out variants. Empirical results on semisynthetic California Prop 99, Basque terrorism, and German reunification demonstrate that LTO often achieves lower unconditional Type-I error and higher power in meaningful regimes, offering a practical alternative to exact and approximate placebo tests for small-sample SCM applications. Overall, Leave-Two-Out constitutes a new form of randomization inference tailored to few-cluster settings and broadens the toolkit for robust SCM inference.

Abstract

The synthetic control method is often applied to problems with one treated unit and a small number of control units. A common inferential task in this setting is to test null hypotheses regarding the average treatment effect on the treated. Inference procedures that are justified asymptotically are often unsatisfactory due to (1) small sample sizes that render large-sample approximation fragile and (2) simplification of the estimation procedure that is implemented in practice. An alternative is permutation inference, which is related to a common diagnostic called the placebo test. It has provable Type-I error guarantees in finite samples without simplification of the method, when the treatment is uniformly assigned. Despite this robustness, the placebo test suffers from low resolution since the null distribution is constructed from only reference estimates, where is the sample size. This creates a barrier for statistical inference at a common level like , especially when is small. We propose a novel leave-two-out procedure that bypasses this issue, while still maintaining the same finite-sample Type-I error guarantee under uniform assignment for a wide range of . Unlike the placebo test whose Type-I error always equals the theoretical upper bound, our procedure often achieves a lower unconditional Type-I error than theory suggests; this enables useful inference in the challenging regime when . Empirically, our procedure achieves a higher power when the effect size is reasonably large and a comparable power otherwise. We generalize our procedure to non-uniform assignments and show how to conduct sensitivity analysis. From a methodological perspective, our procedure can be viewed as a new type of randomization inference different from permutation or rank-based inference, which is particularly effective in small samples.
Paper Structure (39 sections, 11 theorems, 121 equations, 7 figures, 3 tables)

This paper contains 39 sections, 11 theorems, 121 equations, 7 figures, 3 tables.

Key Result

Theorem 2.1

Let $N \ge 3$. Then the following results hold.

Figures (7)

  • Figure 1: Power of several different $p$-values versus effect size, on twenty subsamples of the California Proposition 99 dataset, of size 30. The $x$-axis is the effect size $\tau$ scaled in terms of multiples of the standard deviation of the outcome variable of the dataset. Results are averaged over 20 Monte Carlo runs. The exact placebo, randomized placebo, approximate placebo, and LTO placebo are compared, all constructed using the RMSPE statistic. The rightmost column with $\tau = 0$ is just the Type-I error of the procedures. Red dashed line indicates the level $\alpha$. (Left) $\alpha = 0.02$, which falls in the $\alpha < 1/N$ regime. The exact placebo is omitted because the power is zero in all cases. (Right) $\alpha = 0.05$, which falls in the $\alpha > 1/N$ regime.
  • Figure 2: Output of a sensitivity analysis for the Proposition 99 smoking dataset of abadie2011synth. The red dashed line signifies the level $\alpha = 0.05$. The RMSPE statistic was used in the construction of the synthetic control.
  • Figure 3: The red dashed line shows the Type-I error bounds $\lfloor\frac{N f(N,\alpha)}{N}\rfloor$ in Theorem \ref{['thm:TypeIerror']} of the Naive LTO procedure, which is a non-decreasing piecewise-constant function in $\alpha$. Here, $N = 17$. The stepwise constant solid blue line is the approximate placebo guarantee, $\frac{\lfloor N\alpha\rfloor+1}{N}$.
  • Figure 4: The same simulation as Figure \ref{['fig:smoking_intro_powercomparison']} is carried out. Both versions of placebo and LTO procedures are compared in this experiment. Red dashed line indicates the level $\alpha$.
  • Figure 5: Power of several different $p$-values versus effect size, on size 15 subsamples of the Basque country terrorism dataset. The other details are same as Figure \ref{['fig:smoking_intro_powercomparison']}. (Left) $\alpha = 0.05$, which falls in the $\alpha < 1/N$ regime. The exact placebo is omitted because the power is zero in all cases. (Right) $\alpha = 0.1$, which falls in the $\alpha > 1/N$ regime.
  • ...and 2 more figures

Theorems & Definitions (24)

  • Remark 2.1
  • Theorem 2.1
  • Theorem 2.2
  • Theorem 2.3
  • Definition 3.1: Weighted LTO $p$-values
  • Theorem 3.1
  • Remark 3.1
  • Theorem 3.2
  • Theorem 5.1: Rank-sum LTO placebo test
  • Theorem 5.2
  • ...and 14 more