Table of Contents
Fetching ...

Triply Robust Panel Estimators

Susan Athey, Guido Imbens, Zhaonan Qu, Davide Viviano

TL;DR

The paper develops the Triply Robust Panel (TROP) estimator for causal effects in panel data by combining a flexible low-rank outcome model with unit-specific and time-specific weighting schemes. TROP achieves triple robustness: its bias bound factors into the product of unit imbalance, time imbalance, and regression-misspecification bias, so consistency can arise if any one of these components is well-specified. Through semi-synthetic simulations calibrated to CPS, PWT, and several real datasets, TROP consistently outperforms TWFE/DID, Synthetic Control, Matrix Completion, and SDID in most settings, with interactive fixed effects identified as a key driver of performance. The authors also provide extensions to multiple treated units, covariates, and inference via bootstrap, offering practical guidance on weighting choices and cross-validated tuning. Overall, TROP delivers a flexible, robust framework for counterfactual prediction in complex panel assignments with notable empirical gains and clear paths for applied use.

Abstract

This paper studies estimation of causal effects in a panel data setting. We introduce a new estimator, the Triply RObust Panel (TROP) estimator, that combines (i) a flexible model for the potential outcomes based on a low-rank factor structure on top of a two-way-fixed effect specification, with (ii) unit weights intended to upweight units similar to the treated units and (iii) time weights intended to upweight time periods close to the treated time periods. We study the performance of the estimator in a set of simulations designed to closely match several commonly studied real data sets. We find that there is substantial variation in the performance of the estimators across the settings considered. The proposed estimator outperforms two-way-fixed-effect/difference-in-differences, synthetic control, matrix completion and synthetic-difference-in-differences estimators. We investigate what features of the data generating process lead to this performance, and assess the relative importance of the three components of the proposed estimator. We have two recommendations. Our preferred strategy is that researchers use simulations closely matched to the data they are interested in, along the lines discussed in this paper, to investigate which estimators work well in their particular setting. A simpler approach is to use more robust estimators such as synthetic difference-in-differences or the new triply robust panel estimator which we find to substantially outperform two-way fixed effect estimators in many empirically relevant settings.

Triply Robust Panel Estimators

TL;DR

The paper develops the Triply Robust Panel (TROP) estimator for causal effects in panel data by combining a flexible low-rank outcome model with unit-specific and time-specific weighting schemes. TROP achieves triple robustness: its bias bound factors into the product of unit imbalance, time imbalance, and regression-misspecification bias, so consistency can arise if any one of these components is well-specified. Through semi-synthetic simulations calibrated to CPS, PWT, and several real datasets, TROP consistently outperforms TWFE/DID, Synthetic Control, Matrix Completion, and SDID in most settings, with interactive fixed effects identified as a key driver of performance. The authors also provide extensions to multiple treated units, covariates, and inference via bootstrap, offering practical guidance on weighting choices and cross-validated tuning. Overall, TROP delivers a flexible, robust framework for counterfactual prediction in complex panel assignments with notable empirical gains and clear paths for applied use.

Abstract

This paper studies estimation of causal effects in a panel data setting. We introduce a new estimator, the Triply RObust Panel (TROP) estimator, that combines (i) a flexible model for the potential outcomes based on a low-rank factor structure on top of a two-way-fixed effect specification, with (ii) unit weights intended to upweight units similar to the treated units and (iii) time weights intended to upweight time periods close to the treated time periods. We study the performance of the estimator in a set of simulations designed to closely match several commonly studied real data sets. We find that there is substantial variation in the performance of the estimators across the settings considered. The proposed estimator outperforms two-way-fixed-effect/difference-in-differences, synthetic control, matrix completion and synthetic-difference-in-differences estimators. We investigate what features of the data generating process lead to this performance, and assess the relative importance of the three components of the proposed estimator. We have two recommendations. Our preferred strategy is that researchers use simulations closely matched to the data they are interested in, along the lines discussed in this paper, to investigate which estimators work well in their particular setting. A simpler approach is to use more robust estimators such as synthetic difference-in-differences or the new triply robust panel estimator which we find to substantially outperform two-way fixed effect estimators in many empirically relevant settings.

Paper Structure

This paper contains 31 sections, 8 theorems, 59 equations, 3 figures, 11 tables, 3 algorithms.

Key Result

Theorem 5.1

Let Assumption ass:factor_model hold. Then for fixed (not data dependent) weights $\theta, \omega$ (and conditional on $T_0, N_0$), where $||\cdot||_2$ denotes the $l_2$-norm and $||\cdot||_{\star}$ denotes the spectral norm and $\mathbb{I}_K$ denotes the identity matrix and $B$ as defined in Assumption ass:class_estimators.

Figures (3)

  • Figure A1: Bias of DID estimator on CPS data obtained through 10000 replications where the treatment assignment mechanism follows a logistic model as a function of unobserved factor loadings as described in Equation \ref{['eqn:treatment_assignment']}.
  • Figure A2: RMSEs of estimators on the PWT dataset with the outcome and treatment generated as described in Section \ref{['sec:design']}, using as the outcome the log-GDP as the outcome and as the treatment democracy. In the figure, we vary the number of control units (left) or the number of pre-treatment periods (right). In both cases, the number of control periods and units is ten, with the treatment periods corresponding to the last ten periods in the simulated panel. When we vary the number of control units, we select randomly $N$ units from the full PWT panel while keeping the total number of period $T=48$. When we select the number of pre-treatment periods, we construct shorter panels by selecting the last $T$ periods of the full PWT panel while keeping $N=111$.
  • Figure A3: RMSEs of estimators on the PWT dataset with the outcome and treatment generated as described in Section \ref{['sec:design']}, using as the outcome the log-GDP as the outcome and as the treatment democracy. Here we vary the number of treated units (left) or the number of treatment periods units (right). For the panel on the left, we use $N_{\rm co}=101$ and $T_{\rm co}=38$, while varying $N_{tr}=1,\dots,10$ with $T_{tr}=10$. For the panel on the right, we use $T_{\rm co}=38$ and $N_{\rm co}=111$, while varying $T_{tr}=1,\dots,10$ with $N_{tr}=10$.

Theorems & Definitions (17)

  • Theorem 5.1: Triple robustness
  • proof
  • Corollary 1: Exact unbiasedness
  • Remark 5.1: More general estimators
  • Remark 6.1: Alternative approaches to estimate the effect on multiple treated units
  • Lemma A1
  • proof : Proof of Lemma \ref{['lem:rankone']}
  • Theorem A1
  • proof : Proof of Theorem \ref{['thm:gen-identity']}
  • Corollary A1
  • ...and 7 more