Table of Contents
Fetching ...

Prediction-Powered Conditional Inference

Yang Sui, Jin Zhou, Hua Zhou, Xiaowu Dai

TL;DR

This work introduces a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy.

Abstract

We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on conditional functionals evaluated at a fixed test point, such as conditional means, without imposing a parametric model for the conditional relationship. Our approach combines localization with prediction-based variance reduction. First, we introduce a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment. Second, we incorporate machine-learning predictions through a correction-based decomposition of this localized moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy. We establish nonasymptotic error bounds and minimax-optimal convergence rates for the resulting estimator, prove pointwise asymptotic normality with consistent variance estimation, and provide an explicit variance decomposition that characterizes how machine-learning predictions and unlabeled covariates improve statistical efficiency. Numerical experiments on simulated and real datasets demonstrate valid conditional coverage and substantially sharper confidence intervals than alternative methods.

Prediction-Powered Conditional Inference

TL;DR

This work introduces a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy.

Abstract

We study prediction-powered conditional inference in the setting where labeled data are scarce, unlabeled covariates are abundant, and a black-box machine-learning predictor is available. The goal is to perform statistical inference on conditional functionals evaluated at a fixed test point, such as conditional means, without imposing a parametric model for the conditional relationship. Our approach combines localization with prediction-based variance reduction. First, we introduce a reproducing kernel-based localization method that learns a data-adaptive weight function from covariates and reformulates the target conditional moment at the test point as a weighted unconditional moment. Second, we incorporate machine-learning predictions through a correction-based decomposition of this localized moment, yielding a prediction-powered estimator and confidence interval that reduce variance when the predictor is informative while preserving validity regardless of predictor accuracy. We establish nonasymptotic error bounds and minimax-optimal convergence rates for the resulting estimator, prove pointwise asymptotic normality with consistent variance estimation, and provide an explicit variance decomposition that characterizes how machine-learning predictions and unlabeled covariates improve statistical efficiency. Numerical experiments on simulated and real datasets demonstrate valid conditional coverage and substantially sharper confidence intervals than alternative methods.
Paper Structure (44 sections, 23 theorems, 476 equations, 10 figures, 1 algorithm)

This paper contains 44 sections, 23 theorems, 476 equations, 10 figures, 1 algorithm.

Key Result

proposition 1

Let $x_0$ be an interior point of $\mathcal{X}$. As $\lambda\to0$, $D(x_0;\lambda)\asymp D(\lambda)\asymp \lambda^{-d/(2m)}$.

Figures (10)

  • Figure 1: Protocol for prediction-powered conditional inference. The procedure takes labeled data, unlabeled covariates, an ML predictor, and a test point $x_0$ as inputs. A localization step uses the covariate distribution to learn weights that capture the local structure around $x_0$. The upper block estimates a bias correction from labeled data, while the lower block computes a plug-in term using predictions from the unlabeled data. These components are combined to produce a valid confidence interval $\mathcal{C}(x_0)$ for the conditional target $\theta_0(x_0)$.
  • Figure 2: Empirical $\sigma^2_{Y-f}$, $\sigma^2_Y$, and $\sigma^2_{f}$ for the conditional mean at different (age, sex) test points in the census income data. Results are based on $1000$ replications.
  • Figure 3: Empirical coverage rate and average width of nominal $95\%$ confidence intervals for the conditional mean at the test point ($\text{age}=70, \text{sex}=1$) in the census income data. The red horizontal line indicates the nominal $95\%$ coverage level. Results are based on $1000$ replications.
  • Figure 4: Diabetes progression data illustration. Left: split conformal predictive intervals for the outcome $Y$ at test covariates $X$. Right: a bootstrap uncertainty band for the conditional mean $\theta_0(x)=\mathbb{E}[Y| X=x]$.
  • Figure 5: Empirical RMSE, coverage rate, and average width of nominal 95% confidence intervals for the conditional mean evaluated at different test points in the simulated data. Results are based on 1000 replications per test point.
  • ...and 5 more figures

Theorems & Definitions (44)

  • proposition 1
  • theorem 1
  • theorem 2
  • theorem 3
  • corollary 1
  • proposition 2
  • theorem 4
  • theorem 5
  • corollary 2
  • lemma 1: Theorem 3.8 of adams2003sobolev
  • ...and 34 more