Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

Wenlong Ji; Lihua Lei; Tijana Zrnic

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

Wenlong Ji, Lihua Lei, Tijana Zrnic

TL;DR

This work integrates surrogate outcomes with prediction-powered inference by treating AI predictions as low-cost surrogates for expensive outcomes. It introduces RePPI, a recalibrated imputed-loss approach that uses cross-fitting to learn an optimal imputation and achieves minimal asymptotic variance among PPI estimators when the recalibration is consistent. The method remains advantageous even when the recalibration is imperfect, and it is particularly effective under modality mismatch, distribution shift, and discrete predictions, as demonstrated by theoretical results and empirical studies. Collectively, RePPI delivers substantial gains in effective sample size and reliable uncertainty quantification, enabling more efficient inference in modern, AI-rich settings.

Abstract

We establish a formal connection between the decades-old surrogate outcome model in biostatistics and economics and the emerging field of prediction-powered inference (PPI). The connection treats predictions from pre-trained models, prevalent in the age of AI, as cost-effective surrogates for expensive outcomes. Building on the surrogate outcomes literature, we develop recalibrated prediction-powered inference, a more efficient approach to statistical inference than existing PPI proposals. Our method departs from the existing proposals by using flexible machine learning techniques to learn the optimal ``imputed loss'' through a step we call recalibration. Importantly, the method always improves upon the estimator that relies solely on the data with available true outcomes, even when the optimal imputed loss is estimated imperfectly, and it achieves the smallest asymptotic variance among PPI estimators if the estimate is consistent. Computationally, our optimization objective is convex whenever the loss function that defines the target parameter is convex. We further analyze the benefits of recalibration, both theoretically and numerically, in several common scenarios where machine learning predictions systematically deviate from the outcome of interest. We demonstrate significant gains in effective sample size over existing PPI proposals via three applications leveraging state-of-the-art machine learning/AI models.

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

TL;DR

Abstract

Paper Structure (28 sections, 8 theorems, 87 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 8 theorems, 87 equations, 6 figures, 3 tables, 1 algorithm.

Introduction
Predictions as Surrogates in the Age of AI
Recalibrated PPI: A Lesson from Surrogate Outcomes
A Review of the Surrogate Outcome Model and PPI
Surrogate Outcome Model
Connection to Prediction-Powered Inference
Our Method: Recalibrated PPI
Optimal Imputed Loss
Recalibrated PPI: An Efficient Implementation
Why is Recalibration Important?
Modality Mismatch
Distribution Shift
Discrete Predictions
Experimental Results
US Census Data
...and 13 more sections

Key Result

Theorem 1

Let the target $\theta^\star$ defined in eq:thetastar be unique. Assume that $n/N \rightarrow r$ and the objective function eqn: PPI is convex. Let $H_{\theta^\star} = \mathbb{E}[\nabla^2\ell_{\theta^\star}(X, Y)]$. Under regularity conditions (Assumption asm: regularity in Appendix sec: proof REPPI Furthermore, if $g_\theta$ satisfies eq:optimal_gtheta at $\theta^\star$, i.e., then $\Sigma_g^{{\

Figures (6)

Figure 1: Average length of confidence intervals (left) and coverage (right) in modality mismatch simulation. The horizontal axis represents the variance ratio $\rho = \sigma_X^2 / \sigma_W^2$.
Figure 2: Average length of confidence intervals (left) and coverage (right) in distribution shift simulation. The horizontal axis represents the bias $\|\theta - \tilde{\theta}\|$.
Figure 3: Average length of confidence intervals (left) and coverage (right) in simulation with discrete predictions. The horizontal axis represents $\mu_3$.
Figure 4: $90\%$ confidence intervals in 5 trials (left), average length of confidence intervals (middle), and coverage (right) on US census data. The horizontal axis represents the ratio of labeled data, $\frac{n}{N+n}$.
Figure 5: $90\%$ confidence intervals in 5 trials (left), average length of confidence intervals (middle), and coverage (right) on the politeness data. The horizontal axis represents the ratio of labeled data, $\frac{n}{N+n}$.
...and 1 more figures

Theorems & Definitions (18)

Remark 1
Theorem 1
Example 1: Generalized linear models
Example 2: Quantile regression
Theorem 2
Proposition 1
Proposition 2
Proposition 3
proof : Proof of Theorem \ref{['thm: efficient PPI']}
Theorem 3
...and 8 more

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

TL;DR

Abstract

Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (18)