Anytime-valid, Bayes-assisted, Prediction-Powered Inference
Valentin Kilian, Stefano Cortinovis, François Caron
TL;DR
This work tackles valid sequential inference under a large unlabeled data pool and predictive auxiliary information by extending Prediction-Powered Inference (PPI) to anytime-valid confidence sequences. It develops asymptotic confidence sequences (AsympCS) using Ville's inequality and the method of mixtures, and enhances them with Bayes-assisted priors on the rectifier $\Delta_\theta$, yielding time-uniform CIs that shrink when predictions align with reality. The paper provides asymptotic results for both standard PPI and PPI++ estimators, proves asymptotic Type-I error control, and demonstrates how control variates and strong coupling underpin the theory. Empirical results on synthetic and real datasets show consistent gains in efficiency over classical inference, with Bayes-assisted PPI offering additional gains when predictor quality is favorable and robustness when priors are misspecified.
Abstract
Given a large pool of unlabelled data and a smaller amount of labels, prediction-powered inference (PPI) leverages machine learning predictions to increase the statistical efficiency of confidence interval procedures based solely on labelled data, while preserving fixed-time validity. In this paper, we extend the PPI framework to the sequential setting, where labelled and unlabelled datasets grow over time. Exploiting Ville's inequality and the method of mixtures, we propose prediction-powered confidence sequence procedures that are asymptotically valid uniformly over time and naturally accommodate prior knowledge on the quality of the predictions to further boost efficiency. We carefully illustrate the design choices behind our method and demonstrate its effectiveness in real and synthetic examples.
