Table of Contents
Fetching ...

Do More Predictions Improve Statistical Inference? Filtered Prediction-Powered Inference

Shirong Xu, Will Wei Sun

TL;DR

The paper tackles how to leverage predictions for statistical inference when prediction quality is heterogeneous. It introduces Filtered Prediction-Powered Inference (FPPI), which identifies a data-adaptive informative region $ ext{S}_0=ig\{x:(m(x)- heta^igstar)f(x)>0ig\}$ to restrict prediction-powered corrections, guided by a margin condition for fast region recovery. The authors establish unbiasedness and derive variance reductions for mean estimation and GLMs, proving that FPPI can achieve strictly better AMSE than PPI++ by optimally choosing the region and tuning parameter; they additionally prove asymptotic normality in both discrete and continuous covariate regimes. Empirical validation through simulations and a real LLM-evaluation application demonstrates that FPPI reduces reliance on expensive labels and yields accurate inference even under heterogeneous prediction quality, highlighting practical gains in semi-supervised inference and model evaluation tasks.

Abstract

Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.

Do More Predictions Improve Statistical Inference? Filtered Prediction-Powered Inference

TL;DR

The paper tackles how to leverage predictions for statistical inference when prediction quality is heterogeneous. It introduces Filtered Prediction-Powered Inference (FPPI), which identifies a data-adaptive informative region to restrict prediction-powered corrections, guided by a margin condition for fast region recovery. The authors establish unbiasedness and derive variance reductions for mean estimation and GLMs, proving that FPPI can achieve strictly better AMSE than PPI++ by optimally choosing the region and tuning parameter; they additionally prove asymptotic normality in both discrete and continuous covariate regimes. Empirical validation through simulations and a real LLM-evaluation application demonstrates that FPPI reduces reliance on expensive labels and yields accurate inference even under heterogeneous prediction quality, highlighting practical gains in semi-supervised inference and model evaluation tasks.

Abstract

Recent advances in artificial intelligence have enabled the generation of large-scale, low-cost predictions with increasingly high fidelity. As a result, the primary challenge in statistical inference has shifted from data scarcity to data reliability. Prediction-powered inference methods seek to exploit such predictions to improve efficiency when labeled data are limited. However, existing approaches implicitly adopt a use-all philosophy, under which incorporating more predictions is presumed to improve inference. When prediction quality is heterogeneous, this assumption can fail, and indiscriminate use of unlabeled data may dilute informative signals and degrade inferential accuracy. In this paper, we propose Filtered Prediction-Powered Inference (FPPI), a framework that selectively incorporates predictions by identifying a data-adaptive filtered region in which predictions are informative for inference. We show that this region can be consistently estimated under a margin condition, achieving fast rates of convergence. By restricting the prediction-powered correction to the estimated filtered region, FPPI adaptively mitigates the impact of biased or noisy predictions. We establish that FPPI attains strictly improved asymptotic efficiency compared with existing prediction-powered inference methods. Numerical studies and a real-data application to large language model evaluation demonstrate that FPPI substantially reduces reliance on expensive labels by selectively leveraging reliable predictions, yielding accurate inference even in the presence of heterogeneous prediction quality.
Paper Structure (26 sections, 383 equations, 9 figures, 1 table, 3 algorithms)

This paper contains 26 sections, 383 equations, 9 figures, 1 table, 3 algorithms.

Figures (9)

  • Figure 1: The left panel highlights regions in Example \ref{['Exam:PPIF_new']} where $Y$ and $f(X)$ are positively or negatively correlated. The middle panel depicts the filtered regions where $Y$ and $f(X)$ exhibit highly positive correlation. The right panel shows the estimation error of $\widehat{\theta}(\lambda,1)$ in Example \ref{['Exam:FPPI']} across varying values of $\lambda$.
  • Figure 2: The general procedure of the proposed FPPI framework.
  • Figure 3: Two cases in Theorem \ref{['Thm:GLM_Estimate_S0']} (Linear Regression Example). (Left:) The data lie close to the boundaries $\{\bm{X}:m(\bm{X}) = \bm{X}^\top \bm{\theta}^\star\}$ and $\{\bm{X}:f(\bm{X}) = \bm{X}^\top \bm{\theta}^\star\}$. (Right:) The data are well separated from both boundaries, making the recovery of $\mathcal{S}_0$ easier.
  • Figure 4: Scenario I: Monte Carlo simulation results showing the estimation errors (or variances) of Classic mean, PPI++, and FPPI estimators for different prediction functions and unlabeled sample sizes $N$.
  • Figure 5: Scenario II: Monte Carlo simulation results illustrating the estimation errors of the classical OLS, PPI++, and FPPI estimators on a logarithmic scale, under different prediction functions and unlabeled sample sizes $N$.
  • ...and 4 more figures