Table of Contents
Fetching ...

Multivariate Conformal Prediction using Optimal Transport

Michal Klein, Louis Bethune, Eugene Ndiaye, Marco Cuturi

TL;DR

This work extends conformal prediction to multivariate outputs by leveraging optimal transport to define Kantorovich ranks and center-outward quantiles. It introduces OT-CP, which maps vector-valued conformity scores through an optimal transport map to a univariate score, enabling standard, distribution-free CP with finite-sample guarantees. The paper develops two practical implementations: OT-based merging using the entropic map for tractable transport estimation and a coverage-preserving scheme under approximations, with formal calibration results. Empirical evaluation on a multivariate regression benchmark shows that OT-CP often yields smaller predictive regions than baseline multivariate CP methods, albeit with higher computational cost and sensitivity to hyperparameters like the entropic regularization and sphere discretization. Overall, OT-CP provides a principled, distribution-free framework for uncertainty quantification in high-dimensional prediction tasks, expanding the applicability of conformal methods to multivariate settings.

Abstract

Conformal prediction (CP) quantifies the uncertainty of machine learning models by constructing sets of plausible outputs. These sets are constructed by leveraging a so-called conformity score, a quantity computed using the input point of interest, a prediction model, and past observations. CP sets are then obtained by evaluating the conformity score of all possible outputs, and selecting them according to the rank of their scores. Due to this ranking step, most CP approaches rely on a score functions that are univariate. The challenge in extending these scores to multivariate spaces lies in the fact that no canonical order for vectors exists. To address this, we leverage a natural extension of multivariate score ranking based on optimal transport (OT). Our method, OTCP, offers a principled framework for constructing conformal prediction sets in multidimensional settings, preserving distribution-free coverage guarantees with finite data samples. We demonstrate tangible gains in a benchmark dataset of multivariate regression problems and address computational \& statistical trade-offs that arise when estimating conformity scores through OT maps.

Multivariate Conformal Prediction using Optimal Transport

TL;DR

This work extends conformal prediction to multivariate outputs by leveraging optimal transport to define Kantorovich ranks and center-outward quantiles. It introduces OT-CP, which maps vector-valued conformity scores through an optimal transport map to a univariate score, enabling standard, distribution-free CP with finite-sample guarantees. The paper develops two practical implementations: OT-based merging using the entropic map for tractable transport estimation and a coverage-preserving scheme under approximations, with formal calibration results. Empirical evaluation on a multivariate regression benchmark shows that OT-CP often yields smaller predictive regions than baseline multivariate CP methods, albeit with higher computational cost and sensitivity to hyperparameters like the entropic regularization and sphere discretization. Overall, OT-CP provides a principled, distribution-free framework for uncertainty quantification in high-dimensional prediction tasks, expanding the applicability of conformal methods to multivariate settings.

Abstract

Conformal prediction (CP) quantifies the uncertainty of machine learning models by constructing sets of plausible outputs. These sets are constructed by leveraging a so-called conformity score, a quantity computed using the input point of interest, a prediction model, and past observations. CP sets are then obtained by evaluating the conformity score of all possible outputs, and selecting them according to the rank of their scores. Due to this ranking step, most CP approaches rely on a score functions that are univariate. The challenge in extending these scores to multivariate spaces lies in the fact that no canonical order for vectors exists. To address this, we leverage a natural extension of multivariate score ranking based on optimal transport (OT). Our method, OTCP, offers a principled framework for constructing conformal prediction sets in multidimensional settings, preserving distribution-free coverage guarantees with finite data samples. We demonstrate tangible gains in a benchmark dataset of multivariate regression problems and address computational \& statistical trade-offs that arise when estimating conformity scores through OT maps.

Paper Structure

This paper contains 24 sections, 7 theorems, 52 equations, 10 figures, 1 table.

Key Result

Lemma 2.1

If $Z_1, \dots, Z_n, Z$ be a sequence of real-valued exchangeable random variables, then it holds

Figures (10)

  • Figure 1: We report the mean and standard error of the region size across 10 different seeds. For M-CP, we use $300$ samples to compute the conditional mean, and for OT-CP, we use $\varepsilon = 0.1$ and $2^{15}=32768$ points in the uniform target measure. Overall, OT-CP displays smaller region size than other baselines (13 out of 17 datasets). The output dimension $d$ of each dataset is provided next to its name.
  • Figure 2: This plot details the impact of the two important hyperparameters one needs to set in OT-CP: number of target points $m$ sampled from the uniform ball and the $\varepsilon$ regularization level. As can be seen, larger sample size $m$ improves region size (smaller the better) for roughly all datasets and regularization strengths. On the other hand, one must tune $\varepsilon$ to operate at a suitable regime: not too low, which results in the well-documented poor statistical performance of unregularized / linear program OT, nor too high, which would lead to a collapse of the entropic map to the sphere. Using OTT-JAX and its automatic normalizations, we see that $\varepsilon=0.1$ works best overall.
  • Figure 3: Computational time on small dimensional datasets. OT-CP incurs more compute time due to the OT map estimation. See Fig.\ref{['fig:big-time']} for a similar picture for higher dimensional datasets.
  • Figure 4: As in \ref{['fig:small-region']}, we report mean and standard errors for region size (log scale) across 10 different seeds for larger datasets. We keep the same parameters and importantly $\varepsilon = 0.1$ and $2^{15}=32768$ points in the uniform target measure. We expect the performance of OT-CP to decrease with dimensionality, but it does provide a convincing alternative to the other approaches.
  • Figure 5: Conformal sets recovered by mapping back the reduced sphere on the Manhattan map, in agreement with Equation \ref{['eq:transport_oracle_coverage']}, on a prediction for the taxi dataset. We use the inverse entropic map mentioned in Section \ref{['subsec:entropic']}, mapping back the gridded sphere of size $m=2^{15}$ for each level, and plotting its outer contour.
  • ...and 5 more figures

Theorems & Definitions (13)

  • Lemma 2.1
  • Proposition 2.2: Conformal Prediction Coverage
  • Definition 2.3
  • Definition 3.1
  • Proposition 3.2
  • Remark 3.3: Computational Issues
  • Remark 3.4
  • Lemma 3.5: Coverage of Empirical Quantile Region
  • Proposition 3.6
  • Proposition 2.1
  • ...and 3 more