Direct Bias-Correction Term Estimation for Average Treatment Effect Estimation
Masahiro Kato
TL;DR
This work tackles unbiased ATE estimation by directly estimating the bias-correction term $h_0(D,X)=\frac{\mathbbm{1}[D=1]}{e_0(X)}-\frac{\mathbbm{1}[D=0]}{1-e_0(X)}$ through a unified Bregman-divergence minimization framework. By selecting a differentiable convex function $g$, the authors show that minimizing the population divergence $\mathrm{BR}_g(h)$ (equivalently $\mathrm{BR}^\dagger_g(h_0\mid h)$) yields an estimator $h^*$, and they implement this via empirical risk minimization with regularization. The framework subsumes Riesz regression and tailored KL losses, automatically achieving covariate balancing under certain model/divergence choices, and extends to RKHS and neural-network function classes with provable error bounds. They illustrate the approach with AIPW-based estimators, provide asymptotic normality results under cross-fitting, and demonstrate strong performance in simulations and semi-synthetic IHDP benchmarks. Overall, the paper offers a practical, theoretically grounded route to debiased ATE estimation by directly targeting the bias-correction term rather than relying on propensity-score estimation alone, with broad applicability to flexible modeling choices.
Abstract
This study considers the estimation of the direct bias-correction term for estimating the average treatment effect (ATE). Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the observations, where $X_i$ denotes $K$-dimensional covariates, $D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i$ denotes an outcome. In ATE estimation, $h_0(D_i, X_i) = \frac{1[D_i = 1]}{e_0(X_i)} - \frac{1[D_i = 0]}{1 - e_0(X_i)}$ is called the bias-correction term, where $e_0(X_i)$ is the propensity score. The bias-correction term is also referred to as the Riesz representer or clever covariates, depending on the literature, and plays an important role in construction of efficient ATE estimators. In this study, we propose estimating $h_0$ by directly minimizing the Bregman divergence between its model and $h_0$, which includes squared error and Kullback--Leibler divergence as special cases. Our proposed method is inspired by direct density ratio estimation methods and generalizes existing bias-correction term estimation methods, such as covariate balancing weights, Riesz regression, and nearest neighbor matching. Importantly, under specific choices of bias-correction term models and Bregman divergence, we can automatically ensure the covariate balancing property. Thus, our study provides a practical modeling and estimation approach through a generalization of existing methods.
