Table of Contents
Fetching ...

A Density Ratio Super Learner

Wencheng Wu, David Benkeser

TL;DR

The paper tackles the challenge of estimating density ratios, a quantity central to covariate shift and certain causal-inference estimands. It introduces a density ratio super learner that combines kernel- and classification-based learners within a cross-validated risk framework, guided by a novel qualified loss $L(O,\psi)$ defined as $L(O,\psi)=-\mathbb{I}(\lambda=1)\log\psi(x_1,x_2)+\mathbb{I}(\lambda=0)\log\psi(x_1,x_2)$, which ensures $E_0L(O,\psi)$ is minimized at the true ratio $\psi_0$. The method is evaluated via two Monte Carlo simulations—mediation analysis and LMTP—demonstrating that the density-ratio SL can asymptotically approach oracle performance and offers robust finite-sample behavior, particularly when sample sizes are small. Beyond causal inference, the approach provides a practical tool for tackling covariate shift and other density-ratio estimation problems in diverse domains by leveraging ensemble learning and a principled loss framework.

Abstract

The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.

A Density Ratio Super Learner

TL;DR

The paper tackles the challenge of estimating density ratios, a quantity central to covariate shift and certain causal-inference estimands. It introduces a density ratio super learner that combines kernel- and classification-based learners within a cross-validated risk framework, guided by a novel qualified loss defined as , which ensures is minimized at the true ratio . The method is evaluated via two Monte Carlo simulations—mediation analysis and LMTP—demonstrating that the density-ratio SL can asymptotically approach oracle performance and offers robust finite-sample behavior, particularly when sample sizes are small. Beyond causal inference, the approach provides a practical tool for tackling covariate shift and other density-ratio estimation problems in diverse domains by leveraging ensemble learning and a principled loss framework.

Abstract

The estimation of the ratio of two density probability functions is of great interest in many statistics fields, including causal inference. In this study, we develop an ensemble estimator of density ratios with a novel loss function based on super learning. We show that this novel loss function is qualified for building super learners. Two simulations corresponding to mediation analysis and longitudinal modified treatment policy in causal inference, where density ratios are nuisance parameters, are conducted to show our density ratio super learner's performance empirically.
Paper Structure (13 sections, 1 theorem, 21 equations, 3 figures, 2 tables)

This paper contains 13 sections, 1 theorem, 21 equations, 3 figures, 2 tables.

Key Result

Theorem 2.1

Suppose the marginal distributions of $X_1$ given $X_2$ and $\lambda$ have the same support for different values of $\lambda$, $p_0(\lambda=1)>0$, $p_0(\lambda=0)>0$. $L(O,\psi)=-\mathbb{I}(\lambda=1)\log\psi(x_1,x_2)+\mathbb{I}(\lambda=0)\log\psi(x_1,x_2)$. $E_0L(O,\psi)$ will only be minimized whe

Figures (3)

  • Figure 1: Average Hold-Out Risks For Individual Learners and the Super Learner
  • Figure 2: True Ratio vs Super Learner Estimated Ratio at Different Values of $W\;(n=700)$
  • Figure 3: True Ratio vs Super Learner Estimated Ratio at Different Values of $W\;(n=2000)$

Theorems & Definitions (2)

  • Theorem 2.1
  • proof