Prediction-Powered Inference
Anastasios N. Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I. Jordan, Tijana Zrnic
TL;DR
Prediction-powered inference provides statistically valid confidence intervals by combining abundant ML predictions with scarce gold-standard data, without assumptions about the prediction model. The authors develop a convex-estimation framework using a rectifier to correct prediction bias and construct prediction-powered confidence sets with guaranteed coverage, along with practical algorithms for mean, quantile, and regression tasks. They also extend the approach to nonconvex losses and distribution shifts, and demonstrate substantial data-efficiency and tight intervals across proteomics, astronomy, genomics, remote sensing, census analysis, and ecology. Overall, the framework enables principled, data-efficient inference that leverages modern predictive tools while preserving rigorous statistical validity across diverse domains.
Abstract
Prediction-powered inference is a framework for performing valid statistical inference when an experimental dataset is supplemented with predictions from a machine-learning system. The framework yields simple algorithms for computing provably valid confidence intervals for quantities such as means, quantiles, and linear and logistic regression coefficients, without making any assumptions on the machine-learning algorithm that supplies the predictions. Furthermore, more accurate predictions translate to smaller confidence intervals. Prediction-powered inference could enable researchers to draw valid and more data-efficient conclusions using machine learning. The benefits of prediction-powered inference are demonstrated with datasets from proteomics, astronomy, genomics, remote sensing, census analysis, and ecology.
