Prediction-Powered Inference

Anastasios N. Angelopoulos; Stephen Bates; Clara Fannjiang; Michael I. Jordan; Tijana Zrnic

Prediction-Powered Inference

Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, Tijana Zrnic

TL;DR

Prediction-powered inference provides statistically valid confidence intervals by combining abundant ML predictions with scarce gold-standard data, without assumptions about the prediction model. The authors develop a convex-estimation framework using a rectifier to correct prediction bias and construct prediction-powered confidence sets with guaranteed coverage, along with practical algorithms for mean, quantile, and regression tasks. They also extend the approach to nonconvex losses and distribution shifts, and demonstrate substantial data-efficiency and tight intervals across proteomics, astronomy, genomics, remote sensing, census analysis, and ecology. Overall, the framework enables principled, data-efficient inference that leverages modern predictive tools while preserving rigorous statistical validity across diverse domains.

Abstract

Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients, without making any assumptions on the machine-learning algorithm that supplies the predictions. Furthermore, more accurate predictions translate to smaller confidence intervals. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning. The benefits of prediction-powered inference are demonstrated with datasets from proteomics, astronomy, genomics, remote sensing, census analysis, and ecology.

Prediction-Powered Inference

TL;DR

Abstract

Paper Structure (71 sections, 26 theorems, 124 equations, 8 figures, 2 tables, 13 algorithms)

This paper contains 71 sections, 26 theorems, 124 equations, 8 figures, 2 tables, 13 algorithms.

Introduction
General principle
Further preliminaries
Warmup: Mean estimation
Related work
Main theory: Convex estimation
Defining the rectifier.
Rectifier confidence set.
Prediction-powered confidence set.
Algorithms
Mean estimation.
Quantile estimation.
Logistic regression.
Linear regression.
Applications
...and 56 more sections

Key Result

Theorem 2.1

Suppose that the convex estimation problem is nondegenerate as in eq:gradient-zero. Fix $\alpha\in(0,1)$ and $\delta\in(0,\alpha)$. Suppose that, for any $\theta\in\mathbb{R}^p$, we can construct $\mathcal{R}_\delta(\theta)$ and $\mathcal{T}_{\alpha-\delta}(\theta)$ satisfying Let $\mathcal{C}^{\mathrm{PP}}_\alpha = \left\{\theta : 0 \in \mathcal{R}_\delta(\theta) + \mathcal{T}_{\alpha-\delta}(\t

Figures (8)

Figure 1: Comparison of prediction-powered, classical, and imputation approaches. Each row (A-G) is a different application. Panel (1) plots five randomly chosen intervals for the prediction-powered and classical approaches, and the imputed interval (which is deterministic). Panel (2) plots the average interval width and the width in the five randomly chosen trials, for varying $n$.
Figure 2: Comparison to the post-prediction inference procedure. On the left are five independent random draws of intervals with $n=1000$. On the right is a line plot of interval width as a function of $n$, averaged over $100$ independent trials. Five draws of interval widths are shown as a scatter plot at their respective $n$. The post-prediction inference approach is shown in red, the classical approach is in gray, and the prediction-powered approach is in green. The post-prediction inference approach has diminishing coverage in the experiment.
Figure 3: Comparison to the semi-supervised mean estimation procedure. The plot is the same as in Figure \ref{['fig:postprediction-comparison']}, but with semi-supervised inference shown in red. The semi-supervised intervals have a similar width to the classical ones in this experiment, while the prediction-powered intervals dominates.
Figure 4: Deforestation analysis with a linear model. This is the same figure as Figure \ref{['fig:big-figure']}D, with the same color coding; the prediction-powered approach is green, the classical approach is gray, and the imputation approach is gold. However, the gradient-boosted tree is replaced with an ordinary linear regression. The drop in performance causes the classical intervals to outperform the prediction-powered intervals in terms of power.
Figure 5: AlphaFold analysis with a small unlabeled dataset. This is the same figure as Figure \ref{['fig:big-figure']}A, with the same color coding; the prediction-powered approach is green, the classical approach is gray, and the imputation approach is gold. However, here $N$ is taken to be $1000$. It can be seen that, when $n > N$, the classical baseline outperforms the prediction-powered one.
...and 3 more figures

Theorems & Definitions (26)

Theorem 2.1: Convex estimation
Proposition 2.1: Mean estimation
Proposition 2.2: Quantile estimation
Proposition 2.3: Logistic regression
Proposition 2.4: Linear regression
Theorem 4.1: General risk minimization
Corollary 4.1: Covariate shift
Theorem 4.2: Label shift
Corollary A.1: Mean p-value
Corollary A.2: Quantile p-value
...and 16 more

Prediction-Powered Inference

TL;DR

Abstract

Prediction-Powered Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (26)