Table of Contents
Fetching ...

Conformal Prediction with Learned Features

Shayan Kiyani, George Pappas, Hamed Hassani

TL;DR

This work addresses the challenge of achieving conditional coverage in conformal prediction by learning uncertainty-driven partitions from calibration data (PLCP). The method optimizes a joint objective over region-wise quantiles and a partitioning function, allowing alternating gradient-descent training on standard models, and introduces MSCE as a measure of conditional miscoverage. Theoretical results provide finite- and infinite-data guarantees that quantify the tradeoffs between the number of regions, sample size, and model class complexity, and translate these into practical fallback coverage guarantees. Empirically, PLCP improves conditional coverage and maintains shorter intervals across diverse ID and OOD datasets, outperforming marginal methods and matching or surpassing baselines that rely on predefined structures. The approach offers a scalable, data-driven path to more trustworthy uncertainty quantification with potential impact in critical applications such as healthcare.

Abstract

In this paper, we focus on the problem of conformal prediction with conditional guarantees. Prior work has shown that it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees. A wealth of research has considered relaxations of full conditional guarantees, relying on some predefined uncertainty structures. Departing from this line of thinking, we propose Partition Learning Conformal Prediction (PLCP), a framework to improve conditional validity of prediction sets through learning uncertainty-guided features from the calibration data. We implement PLCP efficiently with alternating gradient descent, utilizing off-the-shelf machine learning models. We further analyze PLCP theoretically and provide conditional guarantees for infinite and finite sample sizes. Finally, our experimental results over four real-world and synthetic datasets show the superior performance of PLCP compared to state-of-the-art methods in terms of coverage and length in both classification and regression scenarios.

Conformal Prediction with Learned Features

TL;DR

This work addresses the challenge of achieving conditional coverage in conformal prediction by learning uncertainty-driven partitions from calibration data (PLCP). The method optimizes a joint objective over region-wise quantiles and a partitioning function, allowing alternating gradient-descent training on standard models, and introduces MSCE as a measure of conditional miscoverage. Theoretical results provide finite- and infinite-data guarantees that quantify the tradeoffs between the number of regions, sample size, and model class complexity, and translate these into practical fallback coverage guarantees. Empirically, PLCP improves conditional coverage and maintains shorter intervals across diverse ID and OOD datasets, outperforming marginal methods and matching or surpassing baselines that rely on predefined structures. The approach offers a scalable, data-driven path to more trustworthy uncertainty quantification with potential impact in critical applications such as healthcare.

Abstract

In this paper, we focus on the problem of conformal prediction with conditional guarantees. Prior work has shown that it is impossible to construct nontrivial prediction sets with full conditional coverage guarantees. A wealth of research has considered relaxations of full conditional guarantees, relying on some predefined uncertainty structures. Departing from this line of thinking, we propose Partition Learning Conformal Prediction (PLCP), a framework to improve conditional validity of prediction sets through learning uncertainty-guided features from the calibration data. We implement PLCP efficiently with alternating gradient descent, utilizing off-the-shelf machine learning models. We further analyze PLCP theoretically and provide conditional guarantees for infinite and finite sample sizes. Finally, our experimental results over four real-world and synthetic datasets show the superior performance of PLCP compared to state-of-the-art methods in terms of coverage and length in both classification and regression scenarios.
Paper Structure (17 sections, 11 theorems, 74 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 11 theorems, 74 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.5

Under assumption ass:reg, for every function $g(x) : \mathcal{X \rightarrow \mathbb{R}}$, we have

Figures (5)

  • Figure 1: (a) Distribution of the labels conditioned on the covariate ($x$). (b) Conditional and marginal distributions of the score. (c) Coverage of the prediction sets of Split Conformal conditioned on $x$. (d) Samples of the form $(x_i,s_i)$. From these samples, we aim to learn the partition/feature $h$ of the covariate space shown by the dashed red line.
  • Figure 2: Left-hand-side plots show coverage and right-hand-side plots show mean prediction set size. Row 1: US Census Data; Row 2: MNIST with Gaussian Blur.
  • Figure 3: Left-hand-side plots show coverage and right-hand-side plots show mean prediction set size. Row 1: Synthetic Regression Task; Row 2: RxRx1 WILDS Dataset.
  • Figure 4: Left-hand-side plots show coverage and right-hand-side plots show mean prediction set size. Row 1: US Census Data; Row 2: MNIST with Gaussian Blur.
  • Figure 5: Sample images from 5 groups with increasing levels of gaussian blur applied from top to bottom.

Theorems & Definitions (26)

  • Remark 2.1
  • Definition 3.3
  • Proposition 3.5
  • Theorem 3.6
  • Corollary 3.7
  • Definition 3.8
  • Definition 3.9
  • Proposition 3.10
  • Theorem 3.11
  • Corollary 3.12
  • ...and 16 more