Table of Contents
Fetching ...

The MAPS Algorithm: Fast model-agnostic and distribution-free prediction intervals for supervised learning

Daniel Salnikov, Dan Leonte, Kevin Michalewicz

TL;DR

The LPM is introduced, a new conditional representation is proposed, and the MAPS (Model-Agnostic Prediction Sets) algorithm is proposed that produces distribution-free conditional prediction intervals and adapts to any trained predictive model.

Abstract

A fundamental problem in modern supervised learning is computing reliable conditional prediction intervals in high-dimensional settings: existing methods often rely on restrictive modelling assumptions, do not scale as predictor dimension increases, or only guarantee marginal (population-level) rather than conditional (individual-level) coverage. We introduce the $\textit{lifted predictive model}$ (LPM), a new conditional representation, and propose the MAPS (Model-Agnostic Prediction Sets) algorithm that produces distribution-free conditional prediction intervals and adapts to any trained predictive model. Our procedure is bootstrap-based, scales to high-dimensional inputs and accounts for heteroscedastic errors. We establish the theoretical properties of the LPM, connect prediction accuracy to interval length, and provide sufficient conditions for asymptotic conditional coverage. We evaluate the finite-sample performance of MAPS in a simulation study, and apply our method to simulation-based inference and image classification. In the former, MAPS provides the first approach for debiasing neural Bayes estimators and constructing valid confidence intervals for model parameters given the estimators, at any desired level. In the latter, it provides the first approach that accounts for uncertainty in model calibration and label prediction.

The MAPS Algorithm: Fast model-agnostic and distribution-free prediction intervals for supervised learning

TL;DR

The LPM is introduced, a new conditional representation is proposed, and the MAPS (Model-Agnostic Prediction Sets) algorithm is proposed that produces distribution-free conditional prediction intervals and adapts to any trained predictive model.

Abstract

A fundamental problem in modern supervised learning is computing reliable conditional prediction intervals in high-dimensional settings: existing methods often rely on restrictive modelling assumptions, do not scale as predictor dimension increases, or only guarantee marginal (population-level) rather than conditional (individual-level) coverage. We introduce the (LPM), a new conditional representation, and propose the MAPS (Model-Agnostic Prediction Sets) algorithm that produces distribution-free conditional prediction intervals and adapts to any trained predictive model. Our procedure is bootstrap-based, scales to high-dimensional inputs and accounts for heteroscedastic errors. We establish the theoretical properties of the LPM, connect prediction accuracy to interval length, and provide sufficient conditions for asymptotic conditional coverage. We evaluate the finite-sample performance of MAPS in a simulation study, and apply our method to simulation-based inference and image classification. In the former, MAPS provides the first approach for debiasing neural Bayes estimators and constructing valid confidence intervals for model parameters given the estimators, at any desired level. In the latter, it provides the first approach that accounts for uncertainty in model calibration and label prediction.

Paper Structure

This paper contains 33 sections, 13 theorems, 199 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Proposition 1

The ideal prediction interval $C_\mathrm{ideal} (\boldsymbol{x}_o)$ solves the optimisation problem for all $\alpha \in (0, 1)$, where $\boldsymbol{x}_o \in \mathcal{X}$, $\lambda_{\mathrm{Leb}}$ is the Lebesgue measure and $C ( \boldsymbol{x}_o) \subset \mathbb{R}$ is a prediction interval. Further, if $p_{\epsilon_o | \boldsymbol{x}_o}$ is unimodal, then the solution to eq: optmial pred interv

Figures (6)

  • Figure 1: Comparison of conditional (---) and marginal (---) prediction intervals for a linear regression derived from $\mathbb{E} \, [Y \, | \, X ]$, where $(X, \, Y) \sim \mathcal{N}_2 ( \boldsymbol{0}, \, \mathbf{\Sigma})$. Conditional prediction intervals are defined by the theoretical $95\%$-quantiles of $Y_o \mid X_o = x_o$. The marginal region is the set of points within an ellipse that contains $95\%$ of the probability mass of the bivariate normal distribution, i.e., $\mathcal{E}_{0.95} = \left\{ (x,y) \in \mathbb{R}^2 : (x,y)^\top \mathbf{\Sigma}^{-1} (x,y) \le \chi^2_{2,0.95} \right\},$ where $\chi^2_{2,0.95}$ is the $95\%$-quantile of the chi-squared distribution with 2 degrees of freedom.
  • Figure 2: Estimated conditional coverage of $\widehat{C}_{\mathrm{maps}} (\hat{y}_k)$ given by \ref{['eq: cond coverage estimator']}, where $\hat{f} (\boldsymbol{x}_o) = \hat{y}_k$ and $\hat{y}_k \in \texttt{linspace}(-2.5, \, 7.5, \, 0.5)$, $k = 1, \dots, 21$, and $\hat{f} \in \{ \hat{f}_\textrm{SVM}, \, \hat{f}_\textrm{GAM}, \, \hat{f}_\textrm{RF} \}$, the data are generated by \ref{['eq: add sim']} with $10,000$ out-of-sample test observations, and $1 - \alpha = 95\%.$
  • Figure 3: Estimated prediction intervals $\widehat{C}_{\mathrm{maps}} (\hat{y}_k)$ for $\hat{f} (\boldsymbol{x}_o) = \hat{y}_k \in \texttt{linspace}(-2.5, \, 7.5, \, 0.5)$, $k = 1, \dots, 21$, and $\hat{f} \in \{ \hat{f}_\textrm{SVM}, \, \hat{f}_\textrm{GAM}, \, \hat{f}_\textrm{RF} \}$, the data are generated by \ref{['eq: add sim']} with $10,000$ out-of-sample test observations, $1 - \alpha = 95\%$, and $n_{\mathrm{train}}, \, n_{\mathrm{cal}} \in \{1000, \, 2500, \, 5000\}$.
  • Figure 4: Results for the application in SBI for parameter inference of spatial data under a NMVMN model sainsbury2025neural using a neural point estimator $\hat{\boldsymbol{\theta}}_j$, where $j = 1, \dots, 5$. Fig. \ref{['figure:inf']}: Histograms of $\hat{\boldsymbol{\theta}}_j$, where the red lines highlight the $0.5\%$, $99.5\%$ empirical quantiles. Fig. \ref{['fig:spline_minus_theta_hat']}: LPM bias adjustments $\hat{\psi}(\hat{\theta}_j) - \hat{\theta}_j$. Fig. \ref{['fig:pred_intervals_sbi']}: $95\%$ and $99\%$ prediction intervals for $\theta_j \mid \hat{\theta}_j$. Fig. \ref{['fig:cond_coverage-sbi']}: Empirical coverage of the intervals displayed in Fig. \ref{['fig:pred_intervals_sbi']}. Fig. \ref{['fig:spline_minus_theta_hat']}--\ref{['fig:cond_coverage-sbi']} are for $\hat{\boldsymbol{\theta}}_j$ in the ranges highlighted by Fig. \ref{['figure:inf']}. See Section \ref{['subsec:maps_for_sbi']} for details.
  • Figure 5: From left to right, prediction intervals for $\mathbf{Pr}(\text{dog})$ computed with a ConvNeXt classifier convnext. Far left, the interval collapses at $\mathbf{Pr}(\text{dog}) = 1$ for the clear dog image. At the centre, the most ambiguous case and the longest interval. Far right, the interval collapses at $\mathbf{Pr}(\text{dog}) = 0$ for the clear not dog image. For each example, we report the lifted prediction probability---$\varphi( \hat{y}_{\psi})$blue dot---ConvNeXt prediction probability---$\varphi( \hat{y})$orange cross---and the lower $\varphi(\hat{c}_1)$ and upper $\varphi(\hat{c}_2)$ interval endpoints.
  • ...and 1 more figures

Theorems & Definitions (22)

  • Proposition 1
  • Lemma 1
  • Definition 1
  • Example 1: Ridge regression
  • Example 2: Smooth functions and MLPs
  • Definition 2: conformal_nonparametric_local
  • Definition 3
  • Lemma 2
  • Theorem 1
  • Corollary 1.1
  • ...and 12 more