Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

Lorenzo Testa; Edward H. Kennedy; Matthew Reimherr

Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

Lorenzo Testa, Edward H. Kennedy, Matthew Reimherr

TL;DR

The paper addresses missing outcomes in Difference-in-Differences by establishing identification and semiparametric efficiency bounds under two MAR missingness mechanisms. It then constructs cross-fitted, efficient, multiplerobust estimators leveraging efficient influence functions and a nested regression (DR-Learner) augmentation to attain oracle-like performance when nuisance models are well specified. The proposed estimators are shown to be asymptotically normal and efficient, with reliability backed by extensive simulations. A real-data demonstration and discussion of extensions underscore the practical relevance for causal inference with incomplete panel data.

Abstract

The Difference-in-Differences (DiD) method is a fundamental tool for causal inference, yet its application is often complicated by missing data. Although recent work has developed robust DiD estimators for complex settings like staggered treatment adoption, these methods typically assume complete data and fail to address the critical challenge of outcomes that are missing at random (MAR) -- a common problem that invalidates standard estimators. We develop a rigorous framework, rooted in semiparametric theory, for identifying and efficiently estimating the Average Treatment Effect on the Treated (ATT) when either pre- or post-treatment (or both) outcomes are missing at random. We first establish nonparametric identification of the ATT under two minimal sets of sufficient conditions. For each, we derive the semiparametric efficiency bound, which provides a formal benchmark for asymptotic optimality. We then propose novel estimators that are asymptotically efficient, achieving this theoretical bound. A key feature of our estimators is their multiple robustness, which ensures consistency even if some nuisance function models are misspecified. We validate the properties of our estimators and showcase their broad applicability through an extensive simulation study.

Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

TL;DR

Abstract

Efficient Difference-in-Differences Estimation when Outcomes are Missing at Random

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (25)