Table of Contents
Fetching ...

Prediction-Powered E-Values

Daniel Csillag, Claudio José Struchiner, Guilherme Tegoni Goedert

TL;DR

The paper develops prediction-powered inference by embedding predictive imputations into e-values, preserving anytime-validity and post-hoc guarantees while expanding inference beyond Z-estimation. By debiasing imputed terms with Bernoulli sampling and a calibrated epsilon, the framework yields prediction-powered e-values that apply to hypothesis testing, confidence sequences, and complex tasks like change-point detection and causal discovery with costly data. The approach demonstrates significant data-efficiency gains across four real-world-inspired case studies, producing tighter valid intervals and earlier detections than conventional baselines. Practically, this modular method can be integrated into existing algorithms to exploit cheap data streams while maintaining rigorous statistical guarantees.

Abstract

Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to Z-estimation problems such as inference of means and quantiles. In this paper, we apply ideas of prediction-powered inference to e-values. By doing so, we inherit all the usual benefits of e-values -- such as anytime-validity, post-hoc validity and versatile sequential inference -- as well as greatly expand the set of inferences achievable in a prediction-powered manner. In particular, we show that every inference procedure that can be framed in terms of e-values has a prediction-powered counterpart, given by our method. We showcase the effectiveness of our framework across a wide range of inference tasks, from simple hypothesis testing and confidence intervals to more involved procedures for change-point detection and causal discovery, which were out of reach of previous techniques. Our approach is modular and easily integrable into existing algorithms, making it a compelling choice for practical applications.

Prediction-Powered E-Values

TL;DR

The paper develops prediction-powered inference by embedding predictive imputations into e-values, preserving anytime-validity and post-hoc guarantees while expanding inference beyond Z-estimation. By debiasing imputed terms with Bernoulli sampling and a calibrated epsilon, the framework yields prediction-powered e-values that apply to hypothesis testing, confidence sequences, and complex tasks like change-point detection and causal discovery with costly data. The approach demonstrates significant data-efficiency gains across four real-world-inspired case studies, producing tighter valid intervals and earlier detections than conventional baselines. Practically, this modular method can be integrated into existing algorithms to exploit cheap data streams while maintaining rigorous statistical guarantees.

Abstract

Quality statistical inference requires a sufficient amount of data, which can be missing or hard to obtain. To this end, prediction-powered inference has risen as a promising methodology, but existing approaches are largely limited to Z-estimation problems such as inference of means and quantiles. In this paper, we apply ideas of prediction-powered inference to e-values. By doing so, we inherit all the usual benefits of e-values -- such as anytime-validity, post-hoc validity and versatile sequential inference -- as well as greatly expand the set of inferences achievable in a prediction-powered manner. In particular, we show that every inference procedure that can be framed in terms of e-values has a prediction-powered counterpart, given by our method. We showcase the effectiveness of our framework across a wide range of inference tasks, from simple hypothesis testing and confidence intervals to more involved procedures for change-point detection and causal discovery, which were out of reach of previous techniques. Our approach is modular and easily integrable into existing algorithms, making it a compelling choice for practical applications.

Paper Structure

This paper contains 20 sections, 17 theorems, 54 equations, 4 figures, 1 algorithm.

Key Result

Theorem 2.1

$E^\mathrm{ppi}_n$ is a valid e-value for the null $H_0$. Additionally:

Figures (4)

  • Figure 1: Prediction-powered confidence sequences. The plot shows the p-landscape (i.e., parameter on the x-axis, reciprocal of the e-value on the y-axis) for the confidence sequence generated by our method (green), along with those for inference using only labelled samples (purple) and by using an imputation approach. The 95% confidence intervals for each p-landscape (i.e., region where the p-landscape is above 0.05) is shaded. Our method provides the tightest valid intervals -- using only the labelled samples or vanilla PPI ppi yields weaker inferences, and using imputation fails to cover the true mean.
  • Figure 2: Prediction-powered anytime-valid hypothesis testing. The plot shows the e-values over time for testing two null hypotheses -- one on the bottom, which should be rejected, and one on top, which should not be rejected. Our prediction-powered e-values provide the strongest valid signal for rejection ($E\geq20$ for a significance level of 95%, marked by the dashed lines), as the imputation approach rejects before the null is actually violated; for non-rejection ($E<20$), all the methods appear valid, but ours still attains the highest e-value.
  • Figure 3: Prediction-powered change-point detection via e-values. The plot shows the exponential moving average of a time series (in blue), with the few collected labels denoted by the scattered Xs. Our prediction-powered methods detect the change-point accurately, while the base method that only considers the labelled data points does not detect any change-point.
  • Figure 4: Prediction-powered causal discovery with e-values. We compare our prediction-powered causal discovery method with one that uses only labelled data. The lighter nodes correspond to the costly variables, while the darker nodes correspond to cheaper readily-available ones. The standard base method does not detect any edges in the causal graph (denoted by the dashed edges), while ours detects as many edges as the 'best possible' method, which uses all the data points regardless of data acquisition costs.

Theorems & Definitions (28)

  • Theorem 2.1
  • Theorem 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Proposition 2.6
  • Theorem 1.1: Theorem \ref{['thm:hypothesis-testing-valid']} in the main text
  • proof
  • Lemma 1.2
  • proof
  • Lemma 1.3
  • ...and 18 more