Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

Ying Zeng

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

Ying Zeng

TL;DR

The paper shows that interpreting the ATE in percentage points from semi-log regressions is biased when treatment effects heterogeneity is present, due to Jensen's inequality. It proposes a subgroup-weighted estimand $ρ_b$ that provides a valid lower bound for the true ATE in percentage points and equals it when there is no within-group heterogeneity, with estimators based on weights $w_g$ and log-point effects $τ_g$ that are asymptotically normal under standard conditions. When point identification of the overall ATE $\bar{ρ}$ is impossible, the paper also derives sharp partial identification bounds using Fréchet–Hoeffding inequalities and covariate conditioning to tighten inference. Through Monte Carlo simulations and two empirical applications—exporting effects on firm productivity and infrastructure effects on child mortality—the method demonstrates improved accuracy and reliability in estimating and testing ATEs in percentage points under heterogeneity. The results provide researchers with practical tools for robust interpretation of treatment effects in settings with log-transformed outcomes and staggered treatment adoption. All mathematical expressions are presented with proper delimitation for clarity and reuse in analytical contexts.

Abstract

In semi-logarithmic regressions, treatment coefficients are often interpreted as approximations of the average treatment effect (ATE) in percentage points. This paper highlights the overlooked bias of this approximation under treatment effect heterogeneity, arising from Jensen's inequality. The issue is particularly relevant for difference-in-differences designs with log-transformed outcomes and staggered treatment adoption, where treatment effects often vary across groups and periods. This paper proposes new estimation and inference methods for an estimand that accounts for heterogeneity across observable subgroups and improves upon conventional measures. The estimand provides a lower bound on the ATE in percentage points, and coincides with it in the absence of within-group heterogeneity. I establish the methods' large-sample properties and study their finite-sample performance through Monte Carlo experiments, which reveal substantial discrepancies between conventional and proposed measures when systematic heterogeneity is large. Two empirical applications further underscore the practical importance of these methods.

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

TL;DR

that provides a valid lower bound for the true ATE in percentage points and equals it when there is no within-group heterogeneity, with estimators based on weights

and log-point effects

that are asymptotically normal under standard conditions. When point identification of the overall ATE

is impossible, the paper also derives sharp partial identification bounds using Fréchet–Hoeffding inequalities and covariate conditioning to tighten inference. Through Monte Carlo simulations and two empirical applications—exporting effects on firm productivity and infrastructure effects on child mortality—the method demonstrates improved accuracy and reliability in estimating and testing ATEs in percentage points under heterogeneity. The results provide researchers with practical tools for robust interpretation of treatment effects in settings with log-transformed outcomes and staggered treatment adoption. All mathematical expressions are presented with proper delimitation for clarity and reuse in analytical contexts.

Abstract

Paper Structure (20 sections, 7 theorems, 64 equations, 4 tables)

This paper contains 20 sections, 7 theorems, 64 equations, 4 tables.

Introduction
ATE in Percentage Points: Framework and Identification
Log-Point and Percentage-Point Treatment Effects under Heterogeneity
Estimation and Inference on $\rho_{b}$
Partial Identification of $\bar{\rho}$
Monte Carlo Experiment
Empirical Applications
Exporting and Firm Performance
Impact of Water and Sewerage Infrastructures on Child Mortality
Conclusion
Estimators Satisfying Assumption \ref{['assu:delta']}
Example 1: Semi-log Regression Model
Example 2: Staggered Difference-in-differences Design
Proof of Lemma \ref{['lem:wtauestimation']}
Estimation and Inference Under Normally Distributed Heterogeneity
...and 5 more sections

Key Result

Proposition 1

The ATE in percentage points satisfies where the first inequality holds with equality if and only if $\operatorname{Var}(\tau_{i}|D_{i}^{(g)}=1)=0$ for all $g$; the second holds with equality if and only if $\tau_{g}=\bar{\tau}$ for all $g$; and the third holds with equality if and only if $\bar{\tau}=0$.

Theorems & Definitions (11)

Remark 1
Proposition 1
Theorem 1
Remark 2
Lemma O.1
Remark 3
proof
Theorem O.1
Corollary 1
Lemma O.2
...and 1 more

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

TL;DR

Abstract

Estimation and Inference on Average Treatment Effect in Percentage Points under Heterogeneity

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (11)