A Density Ratio Super Learner
Wencheng Wu, David Benkeser
TL;DR
The paper tackles the challenge of estimating density ratios, a quantity central to covariate shift and certain causal-inference estimands. It introduces a density ratio super learner that combines kernel- and classification-based learners within a cross-validated risk framework, guided by a novel qualified loss $L(O,\psi)$ defined as $L(O,\psi)=-\mathbb{I}(\lambda=1)\log\psi(x_1,x_2)+\mathbb{I}(\lambda=0)\log\psi(x_1,x_2)$, which ensures $E_0L(O,\psi)$ is minimized at the true ratio $\psi_0$. The method is evaluated via two Monte Carlo simulations—mediation analysis and LMTP—demonstrating that the density-ratio SL can asymptotically approach oracle performance and offers robust finite-sample behavior, particularly when sample sizes are small. Beyond causal inference, the approach provides a practical tool for tackling covariate shift and other density-ratio estimation problems in diverse domains by leveraging ensemble learning and a principled loss framework.
Abstract
The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.
