Table of Contents
Fetching ...

PPI++: Efficient Prediction-Powered Inference

Anastasios N. Angelopoulos, John C. Duchi, Tijana Zrnic

TL;DR

PPI++ tackles the problem of deriving valid, powerful inference with scarce labeled data by leveraging a large pool of machine-learning predictions for unlabeled inputs. It replaces the intractable grid-based confidence construction of the original PPI with computationally efficient convex optimization over a prediction-adjusted loss, while introducing a data-driven weighting parameter $\lambda$ to adapt to the predictive model’s quality. Theoretical guarantees show asymptotic normality of the prediction-powered estimator, equivalence to testing-based confidence sets, and improved efficiency via power tuning, including a plug-in estimator $\hat{\lambda}$ and a one-step variant. Empirically, PPI++ consistently matches or outperforms classical inference and the original PPI across simulations and real-data tasks, often achieving tighter confidence sets when predictions are informative and gracefully degrading to classical results when predictions are weak. This yields a practical, scalable framework for using black-box predictors to enhance scientific inference without sacrificing validity.

Abstract

We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.

PPI++: Efficient Prediction-Powered Inference

TL;DR

PPI++ tackles the problem of deriving valid, powerful inference with scarce labeled data by leveraging a large pool of machine-learning predictions for unlabeled inputs. It replaces the intractable grid-based confidence construction of the original PPI with computationally efficient convex optimization over a prediction-adjusted loss, while introducing a data-driven weighting parameter to adapt to the predictive model’s quality. Theoretical guarantees show asymptotic normality of the prediction-powered estimator, equivalence to testing-based confidence sets, and improved efficiency via power tuning, including a plug-in estimator and a one-step variant. Empirically, PPI++ consistently matches or outperforms classical inference and the original PPI across simulations and real-data tasks, often achieving tighter confidence sets when predictions are informative and gracefully degrading to classical results when predictions are weak. This yields a practical, scalable framework for using black-box predictors to enhance scientific inference without sacrificing validity.

Abstract

We present PPI++: a computationally lightweight methodology for estimation and inference based on a small labeled dataset and a typically much larger dataset of machine-learning predictions. The methods automatically adapt to the quality of available predictions, yielding easy-to-compute confidence sets -- for parameters of any dimensionality -- that always improve on classical intervals using only the labeled data. PPI++ builds on prediction-powered inference (PPI), which targets the same problem setting, improving its computational and statistical efficiency. Real and synthetic experiments demonstrate the benefits of the proposed adaptations.
Paper Structure (34 sections, 12 theorems, 79 equations, 13 figures, 2 algorithms)

This paper contains 34 sections, 12 theorems, 79 equations, 13 figures, 2 algorithms.

Key Result

Corollary 3.1

Assume that $\hat{\lambda} = \lambda + o_P(1)$ for some limit $\lambda$, $\frac{n}{N} \rightarrow r$ and that $H_{\theta^\star}:=\mathbb{E}[\psi"(X^\top\theta^\star) X X^\top]$ is nonsingular. Define the covariance matrices Then for $\Sigma^{\lambda} := H_{\theta^\star}^{-1} \left(r V_{f,\theta^\star}^\lambda + V_{\Delta,\theta^\star}^\lambda \right) H_{\theta^\star}^{-1}$, we have the convergen

Figures (13)

  • Figure 1: Mean estimation simulation study. The left column shows coverage, the right column shows confidence interval width. The top, middle, and bottom rows correspond to noise levels $\sigma=0.1$, $\sigma=1$, and $\sigma=2$, respectively.
  • Figure 2: Linear regression simulation study. The left column shows coverage, the right column shows confidence interval width. The top, middle, and bottom rows correspond to noise levels $\sigma=0.1$, $\sigma=0.5$, and $\sigma=1$, respectively.
  • Figure 3: Logistic regression simulation study. The left column shows coverage, the right column shows confidence interval width. The top, middle, and bottom rows correspond to noise levels $\sigma=0.01$, $\sigma=0.1$, and $\sigma=0.2$, respectively.
  • Figure 4: Mean estimation on deforestation data. The left panel shows coverage and the right shows width.
  • Figure 5: Mean estimation on galaxy data. The left panel shows coverage and the right shows width.
  • ...and 8 more figures

Theorems & Definitions (15)

  • Corollary 3.1
  • Corollary 3.2
  • Theorem 4.1
  • Proposition 4.1
  • Corollary 4.1
  • Theorem 5.1
  • Proposition 6.1
  • Corollary 6.1
  • Corollary 6.2
  • Definition A.1: Smooth enough losses
  • ...and 5 more