PPI++: Efficient Prediction-Powered Inference
Anastasios N. Angelopoulos, John C. Duchi, Tijana Zrnic
TL;DR
PPI++ tackles the problem of deriving valid, powerful inference with scarce labeled data by leveraging a large pool of machine-learning predictions for unlabeled inputs. It replaces the intractable grid-based confidence construction of the original PPI with computationally efficient convex optimization over a prediction-adjusted loss, while introducing a data-driven weighting parameter $\lambda$ to adapt to the predictive model’s quality. Theoretical guarantees show asymptotic normality of the prediction-powered estimator, equivalence to testing-based confidence sets, and improved efficiency via power tuning, including a plug-in estimator $\hat{\lambda}$ and a one-step variant. Empirically, PPI++ consistently matches or outperforms classical inference and the original PPI across simulations and real-data tasks, often achieving tighter confidence sets when predictions are informative and gracefully degrading to classical results when predictions are weak. This yields a practical, scalable framework for using black-box predictors to enhance scientific inference without sacrificing validity.
Abstract
We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.
