Table of Contents
Fetching ...

Prediction-Powered Adaptive Shrinkage Estimation

Sida Li, Nikolaos Ignatiadis

TL;DR

Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means and proves that this tuning strategy is asymptotically optimal.

Abstract

Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical problems, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.

Prediction-Powered Adaptive Shrinkage Estimation

TL;DR

Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means and proves that this tuning strategy is asymptotically optimal.

Abstract

Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior work has demonstrated PPI's benefits for individual statistical problems, modern applications require answering numerous parallel statistical questions. We introduce Prediction-Powered Adaptive Shrinkage (PAS), a method that bridges PPI with empirical Bayes shrinkage to improve the estimation of multiple means. PAS debiases noisy ML predictions within each task and then borrows strength across tasks by using those same predictions as a reference point for shrinkage. The amount of shrinkage is determined by minimizing an unbiased estimate of risk, and we prove that this tuning strategy is asymptotically optimal. Experiments on both synthetic and real-world datasets show that PAS adapts to the reliability of the ML predictions and outperforms traditional and modern baselines in large-scale applications.

Paper Structure

This paper contains 78 sections, 16 theorems, 171 equations, 7 figures, 3 tables, 4 algorithms.

Key Result

Theorem 4.1

Under ass:compound_generic, $\mathrm{CURE}$ is an unbiased estimator of the compound risk defined in eq:compound-risk, that is, for all $\omega \geq 0$ and all $\boldsymbol{\eta}$,

Figures (7)

  • Figure 1: We instantiate the model described in \ref{['ex:synthetic']} with $m = 10$ problems, each has $n_j = 10$ labeled and $N_j = 20$ unlabeled data (we use different colors for all 10 problems). (Top) Labeled data $(X_{ij}, Y_{ij})_{j=1}^{n_j}$ with the classical estimator $\bar{Y}_j$ shown for each problem. (Bottom) We apply a flawed predictor $f(x) = |x|$ to the unlabeled covariates and visualize $(X_{ij}, f(X_{ij}))_{j=1}^{N_j}$ as well as the prediction mean $\tilde{Z}_j^f$.
  • Figure 2: A flowchart illustration of the PAS method. See \ref{['alg:pas']} for a pseudo-code implementation.
  • Figure 3: The power-tuned and adaptive shrinkage parameters, $\lambda^*_j$ and $\hat{\omega}_j$ across $m = 200$ problems in \ref{['ex:synthetic']}. On the $x$-axis, we identify the problem by its $\eta_j$ so the trend is more visible.
  • Figure 4: The ratio between $|\mathrm{Cov}_{\eta_j}\!\left[X_{ij}, Y_{ij}\right]|$ and $\mathrm{Var}_{\eta_j}\!\left[Y_{ij}\right]$ as a function of $\eta_j$. The constants are set to $\psi = 0.1$ and $c = 0.05$.
  • Figure 5: Example of spiral & non-spiral galaxy images from Galaxy Zoo 2.
  • ...and 2 more figures

Theorems & Definitions (28)

  • Example 2.3: Synthetic model
  • Theorem 4.1
  • Proposition 5.1
  • Theorem 5.2
  • Proposition 5.3
  • Theorem 2.1
  • proof
  • Remark 2.2: Connection to SURE
  • Definition 3.1: Univariate Power Tuning (UniPT)
  • Proposition 3.2: Asymptotic Consistency of Clipped Global Tuning Parameter
  • ...and 18 more