Potential Outcome Modeling and Estimation in DiD Designs with Staggered Treatments
Siddhartha Chib, Kenichi Shimizu
TL;DR
This paper tackles the challenge of estimating group-time ATT effects in Difference-in-Differences designs with staggered treatment by introducing a unified potential-outcomes model that enforces parallel trends and no anticipation directly at the model level. It develops a Bayesian estimation framework with flexible priors (including thick-tailed and hierarchical options) and shows a Bernstein-von Mises result to justify frequentist-style inference from Bayesian posterior draws, while also offering an iterated feasible GLS estimator that aligns frequentist procedures with the Bayesian updates. The approach accommodates unobserved heterogeneity via random intercepts and covariate-driven random effects, and it provides practical tools for pre-treatment trend assessment through marginal likelihood comparisons. Empirically, it demonstrates comparable performance to existing methods in large samples and notable improvements in small samples due to regularization, illustrated with simulations and a minimum wage–teen employment app, and is implemented in the bdid software package to support practitioners.
Abstract
We develop a unified model for both treated and untreated potential outcomes for Difference-in-Differences designs with multiple time periods and staggered treatment adoption that respects parallel trends and no anticipation. The model incorporates unobserved heterogeneity through sequence-specific random effects and covariate-dependent random intercepts, allowing for flexible baseline dynamics while preserving causal identification. The model lends itself to straightforward inference about group-specific, time-varying Average Treatment Effects on the Treated (ATTs). In contrast to existing methods, it is easy to regularize the ATT parameters in our framework. For Bayesian inference, prior information on the ATTs is incorporated through black-box training sample priors and, in small-sample settings, through thick-tailed t-priors that shrink ATTs of small magnitude toward zero. A hierarchical prior can be employed when ATTs are defined at sub-categories. A Bernstein-von Mises result justifies posterior inference for the treatment effects. To show that the model provides a common foundation for Bayesian and frequentist inference, we develop an iterated feasible GLS based estimation of the ATTs that is based on the updates in the Bayesian posterior sampling. The model and methodology are illustrated in an empirical study of the effects of minimum wage increases on teen employment in the U.S.
