A Note on the Prediction-Powered Bootstrap
Tijana Zrnic
TL;DR
This note introduces PPBoot, a bootstrap-based approach to prediction-powered inference that applies to arbitrary estimation problems and relies on a single bootstrap to form confidence intervals. By constructing $\theta_b^* = \hat{\theta}(\tilde{X}^*, f(\tilde{X}^*)) + \hat{\theta}(X^*, Y^*) - \hat{\theta}(X^*, f(X^*))$ and using a percentile bootstrap, PPBoot delivers valid intervals without problem-specific asymptotic variance calculations, and can be asymptotically normal when $\hat{\theta}$ is. The paper extends PPBoot with power tuning (via $\lambda$) and cross-fitting (Cross-PPBoot) to boost power and applicability when no pre-trained model is available. Empirical results on Galaxy Zoo 2, AlphaFold, gene expression, and Census tasks show PPBoot often matches or beats PPI/PPI++ in interval width while maintaining proper coverage, and generally produces tighter intervals than classical CLT-based methods. Overall, PPBoot offers a simple, versatile framework for prediction-powered inference that broadens the range of problems amenable to robust, data-efficient uncertainty quantification.
Abstract
We introduce PPBoot: a bootstrap-based method for prediction-powered inference. PPBoot is applicable to arbitrary estimation problems and is very simple to implement, essentially only requiring one application of the bootstrap. Through a series of examples, we demonstrate that PPBoot often performs nearly identically to (and sometimes better than) the earlier PPI(++) method based on asymptotic normality$\unicode{x2013}$when the latter is applicable$\unicode{x2013}$without requiring any asymptotic characterizations. Given its versatility, PPBoot could simplify and expand the scope of application of prediction-powered inference to problems where central limit theorems are hard to prove.
