A Unified Theory for Causal Inference: Direct Debiased Machine Learning via Bregman-Riesz Regression
Masahiro Kato
TL;DR
The paper presents a unified DDML framework that integrates Riesz regression, covariate balancing, density-ratio estimation, TMLE, and matching for efficient ATE estimation. By centering on the Riesz representer $\\alpha_0(D,X)$ and a targeted Neyman estimation objective, it shows how these disparate methods can be viewed as different loss-driven estimations of the same underlying nuisance parameter. It develops a generalized Bregman-Riesz regression that encompasses squared-loss (Riesz regression/LSIF) and KL-divergence (entropy balancing) losses, with dualities that explain covariate balancing as a dual problem to regression-based approaches. The authors propose a practical implementation that combines entropy balancing for the Riesz representer with TMLE updates, providing a concrete workflow that unifies theory and practice for robust ATE estimation. This framework clarifies method interrelationships and offers actionable guidelines for selecting estimation strategies that harness covariate balance, density-ratio estimation, and targeted bias correction.
Abstract
This note introduces a unified theory for causal inference that integrates Riesz regression, covariate balancing, density-ratio estimation (DRE), targeted maximum likelihood estimation (TMLE), and the matching estimator in average treatment effect (ATE) estimation. In ATE estimation, the balancing weights and the regression functions of the outcome play important roles, where the balancing weights are referred to as the Riesz representer, bias-correction term, and clever covariates, depending on the context. Riesz regression, covariate balancing, DRE, and the matching estimator are methods for estimating the balancing weights, where Riesz regression is essentially equivalent to DRE in the ATE context, the matching estimator is a special case of DRE, and DRE is in a dual relationship with covariate balancing. TMLE is a method for constructing regression function estimators such that the leading bias term becomes zero. Nearest Neighbor Matching is equivalent to Least Squares Density Ratio Estimation and Riesz Regression.
