Table of Contents
Fetching ...

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

Ying Zeng

TL;DR

The paper shows that interpreting the ATE in percentage points from semi-log regressions is biased when treatment effects heterogeneity is present, due to Jensen's inequality. It proposes a subgroup-weighted estimand $ρ_b$ that provides a valid lower bound for the true ATE in percentage points and equals it when there is no within-group heterogeneity, with estimators based on weights $w_g$ and log-point effects $τ_g$ that are asymptotically normal under standard conditions. When point identification of the overall ATE $\bar{ρ}$ is impossible, the paper also derives sharp partial identification bounds using Fréchet–Hoeffding inequalities and covariate conditioning to tighten inference. Through Monte Carlo simulations and two empirical applications—exporting effects on firm productivity and infrastructure effects on child mortality—the method demonstrates improved accuracy and reliability in estimating and testing ATEs in percentage points under heterogeneity. The results provide researchers with practical tools for robust interpretation of treatment effects in settings with log-transformed outcomes and staggered treatment adoption. All mathematical expressions are presented with proper delimitation for clarity and reuse in analytical contexts.

Abstract

In semi-logarithmic regressions, treatment coefficients are often interpreted as approximations of the average treatment effect (ATE) in percentage points. This paper highlights the overlooked bias of this approximation under treatment effect heterogeneity, arising from Jensen's inequality. The issue is particularly relevant for difference-in-differences designs with log-transformed outcomes and staggered treatment adoption, where treatment effects often vary across groups and periods. This paper proposes new estimation and inference methods for an estimand that accounts for heterogeneity across observable subgroups and improves upon conventional measures. The estimand provides a lower bound on the ATE in percentage points, and coincides with it in the absence of within-group heterogeneity. I establish the methods' large-sample properties and study their finite-sample performance through Monte Carlo experiments, which reveal substantial discrepancies between conventional and proposed measures when systematic heterogeneity is large. Two empirical applications further underscore the practical importance of these methods.

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

TL;DR

The paper shows that interpreting the ATE in percentage points from semi-log regressions is biased when treatment effects heterogeneity is present, due to Jensen's inequality. It proposes a subgroup-weighted estimand that provides a valid lower bound for the true ATE in percentage points and equals it when there is no within-group heterogeneity, with estimators based on weights and log-point effects that are asymptotically normal under standard conditions. When point identification of the overall ATE is impossible, the paper also derives sharp partial identification bounds using Fréchet–Hoeffding inequalities and covariate conditioning to tighten inference. Through Monte Carlo simulations and two empirical applications—exporting effects on firm productivity and infrastructure effects on child mortality—the method demonstrates improved accuracy and reliability in estimating and testing ATEs in percentage points under heterogeneity. The results provide researchers with practical tools for robust interpretation of treatment effects in settings with log-transformed outcomes and staggered treatment adoption. All mathematical expressions are presented with proper delimitation for clarity and reuse in analytical contexts.

Abstract

In semi-logarithmic regressions, treatment coefficients are often interpreted as approximations of the average treatment effect (ATE) in percentage points. This paper highlights the overlooked bias of this approximation under treatment effect heterogeneity, arising from Jensen's inequality. The issue is particularly relevant for difference-in-differences designs with log-transformed outcomes and staggered treatment adoption, where treatment effects often vary across groups and periods. This paper proposes new estimation and inference methods for an estimand that accounts for heterogeneity across observable subgroups and improves upon conventional measures. The estimand provides a lower bound on the ATE in percentage points, and coincides with it in the absence of within-group heterogeneity. I establish the methods' large-sample properties and study their finite-sample performance through Monte Carlo experiments, which reveal substantial discrepancies between conventional and proposed measures when systematic heterogeneity is large. Two empirical applications further underscore the practical importance of these methods.
Paper Structure (20 sections, 7 theorems, 64 equations, 4 tables)

This paper contains 20 sections, 7 theorems, 64 equations, 4 tables.

Key Result

Proposition 1

The ATE in percentage points satisfies where the first inequality holds with equality if and only if $\operatorname{Var}(\tau_{i}|D_{i}^{(g)}=1)=0$ for all $g$; the second holds with equality if and only if $\tau_{g}=\bar{\tau}$ for all $g$; and the third holds with equality if and only if $\bar{\tau}=0$.

Theorems & Definitions (11)

  • Remark 1
  • Proposition 1
  • Theorem 1
  • Remark 2
  • Lemma O.1
  • Remark 3
  • proof
  • Theorem O.1
  • Corollary 1
  • Lemma O.2
  • ...and 1 more