Table of Contents
Fetching ...

Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions

Gregory Faletto

TL;DR

FETWFE addresses bias in difference-in-differences with staggered adoptions by learning restrictions among a large extended fixed-effects model through fusion penalties. It provides a unified, data-driven framework that preserves asymptotic unbiasedness while improving efficiency, with an oracle-property feature and feasible confidence intervals for several heterogeneous marginal and conditional treatment-effect estimands. The approach is supported by rigorous consistency and asymptotic results, simulations showing improved accuracy, and an empirical application illustrating practical utility. This method offers robust, scalable inference for DD settings with staggered adoption and heterogeneous effects, with broad applicability in policy evaluation and program analysis.

Abstract

To address the bias of the canonical two-way fixed effects estimator for difference-in-differences under staggered adoptions, Wooldridge (2021) proposed the extended two-way fixed effects estimator, which adds many parameters. However, this reduces efficiency. Restricting some of these parameters to be equal (for example, subsequent treatment effects within a cohort) helps, but ad hoc restrictions may reintroduce bias. We propose a machine learning estimator with a single tuning parameter, fused extended two-way fixed effects (FETWFE), that enables automatic data-driven selection of these restrictions. We prove that under an appropriate sparsity assumption FETWFE identifies the correct restrictions with probability tending to one, which improves efficiency. We also prove the consistency, oracle property, and asymptotic normality of FETWFE for several classes of heterogeneous marginal treatment effect estimators under either conditional or marginal parallel trends, and we prove the same results for conditional average treatment effects under conditional parallel trends. We provide an R package implementing fused extended two-way fixed effects, and we demonstrate FETWFE in simulation studies and an empirical application.

Fused Extended Two-Way Fixed Effects for Difference-in-Differences With Staggered Adoptions

TL;DR

FETWFE addresses bias in difference-in-differences with staggered adoptions by learning restrictions among a large extended fixed-effects model through fusion penalties. It provides a unified, data-driven framework that preserves asymptotic unbiasedness while improving efficiency, with an oracle-property feature and feasible confidence intervals for several heterogeneous marginal and conditional treatment-effect estimands. The approach is supported by rigorous consistency and asymptotic results, simulations showing improved accuracy, and an empirical application illustrating practical utility. This method offers robust, scalable inference for DD settings with staggered adoption and heterogeneous effects, with broad applicability in policy evaluation and program analysis.

Abstract

To address the bias of the canonical two-way fixed effects estimator for difference-in-differences under staggered adoptions, Wooldridge (2021) proposed the extended two-way fixed effects estimator, which adds many parameters. However, this reduces efficiency. Restricting some of these parameters to be equal (for example, subsequent treatment effects within a cohort) helps, but ad hoc restrictions may reintroduce bias. We propose a machine learning estimator with a single tuning parameter, fused extended two-way fixed effects (FETWFE), that enables automatic data-driven selection of these restrictions. We prove that under an appropriate sparsity assumption FETWFE identifies the correct restrictions with probability tending to one, which improves efficiency. We also prove the consistency, oracle property, and asymptotic normality of FETWFE for several classes of heterogeneous marginal treatment effect estimators under either conditional or marginal parallel trends, and we prove the same results for conditional average treatment effects under conditional parallel trends. We provide an R package implementing fused extended two-way fixed effects, and we demonstrate FETWFE in simulation studies and an empirical application.
Paper Structure (57 sections, 20 theorems, 150 equations, 9 figures, 7 tables)

This paper contains 57 sections, 20 theorems, 150 equations, 9 figures, 7 tables.

Key Result

Theorem 6.1

Assume that Assumptions (CNAS), (CCTSB), and (LINS) hold, as well as Assumptions (F1), (F2), S($s_N$), and (R1) - (R3). Let $q > 0$.

Figures (9)

  • Figure 1: Visualization of which of the estimated marginal average treatment effect terms $\hat{\tau}_{rt}$ from regression \ref{['wooldridge.6.33.model']} (which estimate the average treatment effects $\tau_{\text{ATT}} (r, t)$ from Equation \ref{['att.cohort.time']}) we penalize towards each other in the FETWFE penalty \ref{['fetwfe.penalty']}. In this setting, $T = 6$ and $\mathcal{R} = \{2, \ldots, 5\}$. The horizontal axis depicts time and the vertical axis depicts cohorts. FETWFE works well under an assumption that the linked treatment effects tend to be close together, and at least some of them are exactly equal. See further details in Section \ref{['sec.meth']}.
  • Figure 2: Another example of a possible valid penalty structure that mainly penalizes treatment effects across cohorts towards each other based on time since the start of treatment, as mentioned in Remark \ref{['d.remark']}. Compare to Figure \ref{['fig:fusion-penalties']}. All of our theoretical results for FETWFE would work with such a penalty structure up to a change in constants in the results.
  • Figure 3: Boxplots of squared errors for each treatment effect estimate across all 700 simulations. Vertical axis is on a log scale.
  • Figure 4: Boxplot displaying proportions of correct treatment effect restriction decisions correctly made by FETWFE across each of the 700 simulations from the first simulation study in Section \ref{['synth.exps.sec']}.
  • Figure 5: Boxplots of squared errors for each method's estimate of $\tau_{\text{ATT}} (2)$ across all 700 simulations. Vertical axis is on a log scale.
  • ...and 4 more figures

Theorems & Definitions (57)

  • Remark 3.1
  • Remark 5.1
  • Remark 5.2
  • Theorem 6.1: Consistency of FETWFE
  • proof
  • Theorem 6.2: Selection consistency
  • proof
  • Theorem 6.3: Oracle Property of FETWFE
  • proof
  • Theorem 6.4: Asymptotic Confidence Intervals for FETWFE
  • ...and 47 more