Table of Contents
Fetching ...

Direct Bias-Correction Term Estimation for Average Treatment Effect Estimation

Masahiro Kato

TL;DR

This work tackles unbiased ATE estimation by directly estimating the bias-correction term $h_0(D,X)=\frac{\mathbbm{1}[D=1]}{e_0(X)}-\frac{\mathbbm{1}[D=0]}{1-e_0(X)}$ through a unified Bregman-divergence minimization framework. By selecting a differentiable convex function $g$, the authors show that minimizing the population divergence $\mathrm{BR}_g(h)$ (equivalently $\mathrm{BR}^\dagger_g(h_0\mid h)$) yields an estimator $h^*$, and they implement this via empirical risk minimization with regularization. The framework subsumes Riesz regression and tailored KL losses, automatically achieving covariate balancing under certain model/divergence choices, and extends to RKHS and neural-network function classes with provable error bounds. They illustrate the approach with AIPW-based estimators, provide asymptotic normality results under cross-fitting, and demonstrate strong performance in simulations and semi-synthetic IHDP benchmarks. Overall, the paper offers a practical, theoretically grounded route to debiased ATE estimation by directly targeting the bias-correction term rather than relying on propensity-score estimation alone, with broad applicability to flexible modeling choices.

Abstract

This study considers the estimation of the direct bias-correction term for estimating the average treatment effect (ATE). Let $\{(X_i, D_i, Y_i)\}_{i=1}^{n}$ be the observations, where $X_i$ denotes $K$-dimensional covariates, $D_i \in \{0, 1\}$ denotes a binary treatment assignment indicator, and $Y_i$ denotes an outcome. In ATE estimation, $h_0(D_i, X_i) = \frac{1[D_i = 1]}{e_0(X_i)} - \frac{1[D_i = 0]}{1 - e_0(X_i)}$ is called the bias-correction term, where $e_0(X_i)$ is the propensity score. The bias-correction term is also referred to as the Riesz representer or clever covariates, depending on the literature, and plays an important role in construction of efficient ATE estimators. In this study, we propose estimating $h_0$ by directly minimizing the Bregman divergence between its model and $h_0$, which includes squared error and Kullback--Leibler divergence as special cases. Our proposed method is inspired by direct density ratio estimation methods and generalizes existing bias-correction term estimation methods, such as covariate balancing weights, Riesz regression, and nearest neighbor matching. Importantly, under specific choices of bias-correction term models and Bregman divergence, we can automatically ensure the covariate balancing property. Thus, our study provides a practical modeling and estimation approach through a generalization of existing methods.

Direct Bias-Correction Term Estimation for Average Treatment Effect Estimation

TL;DR

This work tackles unbiased ATE estimation by directly estimating the bias-correction term through a unified Bregman-divergence minimization framework. By selecting a differentiable convex function , the authors show that minimizing the population divergence (equivalently ) yields an estimator , and they implement this via empirical risk minimization with regularization. The framework subsumes Riesz regression and tailored KL losses, automatically achieving covariate balancing under certain model/divergence choices, and extends to RKHS and neural-network function classes with provable error bounds. They illustrate the approach with AIPW-based estimators, provide asymptotic normality results under cross-fitting, and demonstrate strong performance in simulations and semi-synthetic IHDP benchmarks. Overall, the paper offers a practical, theoretically grounded route to debiased ATE estimation by directly targeting the bias-correction term rather than relying on propensity-score estimation alone, with broad applicability to flexible modeling choices.

Abstract

This study considers the estimation of the direct bias-correction term for estimating the average treatment effect (ATE). Let be the observations, where denotes -dimensional covariates, denotes a binary treatment assignment indicator, and denotes an outcome. In ATE estimation, is called the bias-correction term, where is the propensity score. The bias-correction term is also referred to as the Riesz representer or clever covariates, depending on the literature, and plays an important role in construction of efficient ATE estimators. In this study, we propose estimating by directly minimizing the Bregman divergence between its model and , which includes squared error and Kullback--Leibler divergence as special cases. Our proposed method is inspired by direct density ratio estimation methods and generalizes existing bias-correction term estimation methods, such as covariate balancing weights, Riesz regression, and nearest neighbor matching. Importantly, under specific choices of bias-correction term models and Bregman divergence, we can automatically ensure the covariate balancing property. Thus, our study provides a practical modeling and estimation approach through a generalization of existing methods.

Paper Structure

This paper contains 54 sections, 18 theorems, 118 equations, 1 figure, 4 tables.

Key Result

Theorem 4.1

Suppose that $g$ is $\mu$-strongly convex and there exist constant $C > 0$ such that $|g"(t)| \le C \quad \forall t \in {\mathbb{R}}$. Assume also that $\zeta^{-1}(0)$ is finite. Suppose that Assumptions asm:boundedness and asm:covering hold. Set the regularization parameter $\lambda = \lambda_n$ so

Figures (1)

  • Figure 1: Relationship among bias-correction term estimation via Bregman divergence minimization, density ratio estimation, and covariate balancing. This figure is made using the results in Kato2025directdebiased and Kato2025unifiedtheory.

Theorems & Definitions (24)

  • Theorem 4.1: $L_2$-norm estimation error bound
  • Definition 4.1: FNNs. From Zheng2022anerror
  • Theorem 4.2: Estimation error bound for neural networks
  • Theorem 5.1: Asymptotic normality
  • Proposition D.1: From Theorem 2.1 in Bartlett2005localrademacher
  • Definition D.1
  • Proposition D.2: Talagrand's Lemma
  • Lemma E.1: $L_2$ distance bound from Lemma 4 in Kato2021nonnegativebregman
  • Proposition E.2
  • Proposition E.3
  • ...and 14 more