Riesz Regression As Direct Density Ratio Estimation
Masahiro Kato
TL;DR
This paper establishes that Riesz regression used in debiased machine learning for causal parameter estimation, notably the average treatment effect $\tau^{\text{ATE}}_0$, is equivalent to Least-Squares Importance Fitting (LSIF), a direct density-ratio estimation method, in key settings. By expressing the Riesz representer as a density ratio and aligning the Riesz regression loss with LSIF, the work enables importing rich results from the density-ratio literature, including Bregman-divergence generalization, nonparametric convergence rates, and regularization strategies for flexible models such as neural nets and RKHS-based estimators. It also clarifies that the Riesz representer can be written as $\alpha^{\text{ATE}}_0(D,X)=D r_0(1,X)-(1-D) r_0(0,X)$ with $r_0(d,x)$ linking conditional and marginal densities, connecting causal inference tools like Neyman orthogonality to density-ratio methods such as KLIEP and PU learning. The contributions consolidate prior results into a unified theory bridging debiased ML and direct density-ratio estimation, enabling sharper theoretical guarantees and broader methodological cross-pollination, including ties to covariate balancing and nearest-neighbor matching.
Abstract
Riesz regression has garnered attention as a tool in debiased machine learning for causal and structural parameter estimation (Chernozhukov et al., 2021). This study shows that Riesz regression is closely related to direct density-ratio estimation (DRE) in important cases, including average treat- ment effect (ATE) estimation. Specifically, the idea and objective in Riesz regression coincide with the one in least-squares importance fitting (LSIF, Kanamori et al., 2009) in direct density-ratio estimation. While Riesz regression is general in the sense that it can be applied to Riesz representer estimation in a wide class of problems, the equivalence with DRE allows us to directly import exist- ing results in specific cases, including convergence-rate analyses, the selection of loss functions via Bregman-divergence minimization, and regularization techniques for flexible models, such as neural networks. Conversely, insights about the Riesz representer in debiased machine learning broaden the applications of direct density-ratio estimation methods. This paper consolidates our prior results in Kato (2025a) and Kato (2025b).
