Table of Contents
Fetching ...

Prediction-Powered Inference with Inverse Probability Weighting

Jyotishka Datta, Nicholas G. Polson

Abstract

Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. Building on existing PPI results under covariate shift, we show that PPI rectification admits a direct design-based interpretation, and that informative labeling can be handled naturally by Horvitz--Thompson and Hájek-style corrections. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.

Prediction-Powered Inference with Inverse Probability Weighting

Abstract

Prediction-powered inference (PPI) is a recent framework for valid statistical inference with partially labeled data, combining model-based predictions on a large unlabeled set with bias correction from a smaller labeled subset. Building on existing PPI results under covariate shift, we show that PPI rectification admits a direct design-based interpretation, and that informative labeling can be handled naturally by Horvitz--Thompson and Hájek-style corrections. This connection unites design-based survey sampling ideas with modern prediction-assisted inference, yielding estimators that remain valid when labeling probabilities vary across units. We consider the common setting where the inclusion probabilities are not known but estimated from a correctly specified model. In simulations, the performance of IPW-adjusted PPI with estimated propensities closely matches the known-probability case, retaining both nominal coverage and the variance-reduction benefits of PPI.

Paper Structure

This paper contains 12 sections, 1 theorem, 25 equations, 3 figures, 2 tables.

Key Result

Lemma 2

Let the evidence be written as . Suppose $\Lambda(s)$ has a continuous first derivative and a bounded second derivative $\Lambda"(s)$ on the unit interval. Define $U_{(0)} \equiv 0$, $U_{(n+1)} \equiv 1$, and let $\{U_{(i)}\}_{i=1}^n$ denote the order statistics from $n$ independent $\mathcal{U}(0,1)$ draws (so $U_{(i)} \ge U_{(i Then, for some constant $M > 0$, ${\mathbb E}\left[(\theta - \hat{\

Figures (3)

  • Figure 1: NHANES example: point estimates and 95% confidence intervals for the Classic mean, Horvitz--Thompson (HT), Hájek, and PPI rectifiers with and without weighting, under informative labeling depending on age. The dashed line marks the population mean BMI for the NHANES dataset after omitting the missing values across columns.
  • Figure 2: Horizontal 95% confidence intervals for the first 10 replicates in the simulation with informative labeling with the dashed line indicating the true population mean.
  • Figure 3: Horizontal 95% confidence intervals for the first 10 replicates in the simulation with informative labeling, shown separately for labeled proportions $p_{\text{lab}}=0.01$, $0.02$, and $0.05$. The dashed line indicates the true population mean.

Theorems & Definitions (2)

  • Remark 1
  • Lemma 2