Table of Contents
Fetching ...

Probabilistic Conformal Prediction with Approximate Conditional Validity

Vincent Plassier, Alexander Fishkov, Mohsen Guizani, Maxim Panov, Eric Moulines

TL;DR

This work develops a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution, and derives non-asymptotic bounds that depend on the total variation distance of the conditional distribution and its estimate.

Abstract

We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution $P_{Y \mid X}$. Existing methods, such as conformalized quantile regression and probabilistic conformal prediction, usually provide only a marginal coverage guarantee. In contrast, our approach extends these frameworks to achieve approximately conditional coverage, which is crucial for many practical applications. Our prediction sets adapt to the behavior of the predictive distribution, making them effective even under high heteroscedasticity. While exact conditional guarantees are infeasible without assumptions on the underlying data distribution, we derive non-asymptotic bounds that depend on the total variation distance of the conditional distribution and its estimate. Using extensive simulations, we show that our method consistently outperforms existing approaches in terms of conditional coverage, leading to more reliable statistical inference in a variety of applications.

Probabilistic Conformal Prediction with Approximate Conditional Validity

TL;DR

This work develops a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution, and derives non-asymptotic bounds that depend on the total variation distance of the conditional distribution and its estimate.

Abstract

We develop a new method for generating prediction sets that combines the flexibility of conformal methods with an estimate of the conditional distribution . Existing methods, such as conformalized quantile regression and probabilistic conformal prediction, usually provide only a marginal coverage guarantee. In contrast, our approach extends these frameworks to achieve approximately conditional coverage, which is crucial for many practical applications. Our prediction sets adapt to the behavior of the predictive distribution, making them effective even under high heteroscedasticity. While exact conditional guarantees are infeasible without assumptions on the underlying data distribution, we derive non-asymptotic bounds that depend on the total variation distance of the conditional distribution and its estimate. Using extensive simulations, we show that our method consistently outperforms existing approaches in terms of conditional coverage, leading to more reliable statistical inference in a variety of applications.
Paper Structure (27 sections, 14 theorems, 111 equations, 9 figures, 4 tables, 2 algorithms)

This paper contains 27 sections, 14 theorems, 111 equations, 9 figures, 4 tables, 2 algorithms.

Key Result

Theorem 3.1

Assume ass:confReg-ass:tau. Then, for any $\alpha\in(0,1)$, it holds $1 - \alpha \le \mathbb{P}\left({Y_{n+1} \in \mathcal{C}_{\alpha}(X_{n+1})}\right)$. Moreover, if the conformity scores $\{f_{\bar{\tau}_{k}}^{-1}(\bar{\lambda}_{k})\}_{k=1}^{n+1}$ are almost surely distinct, then it also holds tha

Figures (9)

  • Figure 1: Predictions sets obtained via the standard CP and CP$^2$ methods.
  • Figure 2: Mixture Density Network: the multimodal case.
  • Figure 3: Worst-slab coverage on real data. Results averaged over 50 random splits of each dataset. Calibration and test set sizes set to 2000, 50 conditional samples for PCP, CP$^2$ and $\Pi_{Y\mid X}$. Worst-slab coverage parameter $(1-\delta)=0.1$. Nominal coverage level is $(1-\alpha)=0.9$ and is shown in dashed black. Methods with conditional coverage below $0.75$ shown as cross-hatched on horizontal axis.
  • Figure 4: Sizes of the prediction sets on real data. We divide the size of the set by the standard deviation of response to present the results on the same scale.
  • Figure 5: Conditional coverage for different clusters, fb1 dataset. We have used HDBSCAN algorithm with minimum cluster size of 100, min_samples hyper-parameter of 20 and $l_2$ metric. Cluster label -1 corresponds to the outliers. Sample size for sampling-based methods was set to 50. Nominal coverage equals $(1-\alpha)=0.9$ and is shown in dashed blacks.
  • ...and 4 more figures

Theorems & Definitions (25)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma A.1
  • proof
  • Theorem A.2
  • proof
  • Lemma A.3
  • proof
  • Theorem A.4
  • ...and 15 more