Table of Contents
Fetching ...

Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data

Alvaro H. C. Correia, Christos Louizos

TL;DR

This work tackles the breakdown of conformal prediction guarantees under distribution shifts by introducing total coverage gap as a unifying metric and deriving OT-based bounds that relate miscoverage to score-distribution differences. It develops two main bounds: a density-CDF bound and a 1-Wasserstein bound, plus a novel unlabeled-data bound that relies on auxiliary score distributions derived from unlabeled test data, enabling label-free calibration. The authors then learn calibration weights by optimizing these bounds, using labeled calibration data and unlabeled test data, and extend the framework to regression. Empirically, the method yields substantial reductions in coverage gap on synthetic regression and real-world datasets such as ImageNet-C and iWildCam, often outperforming likelihood-ratio, entropy-scaled, and conditional-conformal baselines while preserving interpretation of prediction sets. Overall, the approach provides a general, practical framework for robust uncertainty quantification under shifts with minimal labeled data from the test distribution.

Abstract

Conformal prediction is a distribution-free uncertainty quantification method that has gained popularity in the machine learning community due to its finite-sample guarantees and ease of use. Its most common variant, dubbed split conformal prediction, is also computationally efficient as it boils down to collecting statistics of the model predictions on some calibration data not yet seen by the model. Nonetheless, these guarantees only hold if the calibration and test data are exchangeable, a condition that is difficult to verify and often violated in practice due to so-called distribution shifts. The literature is rife with methods to mitigate the loss in coverage in this non-exchangeable setting, but these methods require some prior information on the type of distribution shift to be expected at test time. In this work, we study this problem via a new perspective, through the lens of optimal transport, and show that it is possible to estimate the loss in coverage and mitigate arbitrary distribution shifts, offering a principled and broadly applicable solution.

Non-exchangeable Conformal Prediction with Optimal Transport: Tackling Distribution Shifts with Unlabeled Data

TL;DR

This work tackles the breakdown of conformal prediction guarantees under distribution shifts by introducing total coverage gap as a unifying metric and deriving OT-based bounds that relate miscoverage to score-distribution differences. It develops two main bounds: a density-CDF bound and a 1-Wasserstein bound, plus a novel unlabeled-data bound that relies on auxiliary score distributions derived from unlabeled test data, enabling label-free calibration. The authors then learn calibration weights by optimizing these bounds, using labeled calibration data and unlabeled test data, and extend the framework to regression. Empirically, the method yields substantial reductions in coverage gap on synthetic regression and real-world datasets such as ImageNet-C and iWildCam, often outperforming likelihood-ratio, entropy-scaled, and conditional-conformal baselines while preserving interpretation of prediction sets. Overall, the approach provides a general, practical framework for robust uncertainty quantification under shifts with minimal labeled data from the test distribution.

Abstract

Conformal prediction is a distribution-free uncertainty quantification method that has gained popularity in the machine learning community due to its finite-sample guarantees and ease of use. Its most common variant, dubbed split conformal prediction, is also computationally efficient as it boils down to collecting statistics of the model predictions on some calibration data not yet seen by the model. Nonetheless, these guarantees only hold if the calibration and test data are exchangeable, a condition that is difficult to verify and often violated in practice due to so-called distribution shifts. The literature is rife with methods to mitigate the loss in coverage in this non-exchangeable setting, but these methods require some prior information on the type of distribution shift to be expected at test time. In this work, we study this problem via a new perspective, through the lens of optimal transport, and show that it is possible to estimate the loss in coverage and mitigate arbitrary distribution shifts, offering a principled and broadly applicable solution.

Paper Structure

This paper contains 51 sections, 7 theorems, 61 equations, 4 figures, 6 tables, 2 algorithms.

Key Result

Theorem 3.2

Let $P$ and $Q$ be probability measures on ${\mathcal{X}} \times {\mathcal{Y}}$ with ${s}_\sharp{P}$ and ${s}_\sharp{Q}$ their respective pushforward measures by a score function $s: {\mathcal{X}} \times {\mathcal{Y}} \rightarrow \mathbb{R}$. Assume ${s}_\sharp{P}$ is absolutely continuous with resp

Figures (4)

  • Figure 1: Empirical CDFs of nonconformity scores in ImageNet-C Gaussian noise under the calibration ${s}_\sharp{\hat{P}}$, test ${s}_\sharp{\hat{Q}}$, and auxiliary distributions. We can visually verify ${s}_\sharp{\hat{Q}^{\max}} \succcurlyeq {s}_\sharp{\hat{Q}} \succcurlyeq {s}_\sharp{\hat{Q}^{\min}}$ and ${s}_\sharp{\hat{Q}^{U}} \succcurlyeq {s}_\sharp{\hat{Q}} \succcurlyeq {s}_\sharp{\hat{Q}^{f}}$.
  • Figure 2: Total coverage gap in ImageNet-C Fog with weights learned via likelihood ratio estimation (orange), optimal transport with $({s}_\sharp{\hat{Q}^{\min}}$, ${s}_\sharp{\hat{Q}^{\max}})$ in green, and $({s}_\sharp{\hat{Q}^{f}}$, ${s}_\sharp{\hat{Q}^{U}})$ in gray.
  • Figure 3: Distribution of coverage and prediction set sizes for the synthetic regression task across 500 simulations and target coverage rate of $90\%$ (blue vertical line). For ease of visualization, we plot the density estimated with a KDE fit to the 500 observations.
  • Figure 4: Distribution of coverage for the synthetic regression task across 500 simulations and target coverage rate of $90\%$ (blue vertical line). Results with the 1-Wasserstein distance formulation on the left and with the weighted CDF formulation on the right. The baselines remain the same in both plots. For ease of visualization, we plot the density estimated with a KDE fit to the 500 observations.

Theorems & Definitions (21)

  • Definition 3.1: Total coverage gap
  • Theorem 3.2
  • Theorem 3.3
  • proof
  • Remark A.1: On the expectation under a weighted measure
  • Remark A.2: On practical weighting
  • Proposition A.3: Empirical weighted bound
  • proof
  • Lemma A.4: de20211
  • proof : Proof of (\ref{['eq:unlabeled-density-weighted']})
  • ...and 11 more