Multivariate Conformal Prediction via Conformalized Gaussian Scoring
Sacha Braun, Eugène Berta, Michael I. Jordan, Francis Bach
TL;DR
This work tackles the challenge of achieving conditional coverage in multivariate conformal prediction by introducing Gaussian-conformal prediction, which treats ${\mathbb P}_{Y|X}$ as a feature-dependent Gaussian ${\mathcal N}(f(X),\Sigma(X))$. It derives a closed-form, Mahalanobis-distance-based non-conformity score, enabling ellipsoidal conformal sets that adapt to local uncertainty and heteroskedasticity, and extends naturally to missing outputs, partially revealed outputs, and transformations of the output space. The method can be applied post-hoc on top of existing predictors by learning a conditional covariance, provides finite-sample coverage guarantees (marginal, with extensions to the transformed and partially observed cases), and shows improved empirical conditional coverage in synthetic and real multivariate datasets. Practically, this yields data-dependent, computationally efficient conformal sets suitable for complex, high-dimensional prediction tasks with incomplete or evolving output information. Overall, the approach offers a flexible, scalable framework that enhances uncertainty quantification for multivariate predictions in conformal prediction settings.
Abstract
While achieving exact conditional coverage in conformal prediction is unattainable without making strong, untestable regularity assumptions, the promise of conformal prediction hinges on finding approximations to conditional guarantees that are realizable in practice. A promising direction for obtaining conditional dependence for conformal sets--in particular capturing heteroskedasticity--is through estimating the conditional density $\mathbb{P}_{Y|X}$ and conformalizing its level sets. Previous work in this vein has focused on nonconformity scores based on the empirical cumulative distribution function (CDF). Such scores are, however, computationally costly, typically requiring expensive sampling methods. To avoid the need for sampling, we observe that the CDF-based score reduces to a Mahalanobis distance in the case of Gaussian scores, yielding a closed-form expression that can be directly conformalized. Moreover, the use of a Gaussian-based score opens the door to a number of extensions of the basic conformal method; in particular, we show how to construct conformal sets with missing output values, refine conformal sets as partial information about $Y$ becomes available, and construct conformal sets on transformations of the output space. Finally, empirical results indicate that our approach produces conformal sets that more closely approximate conditional coverage in multivariate settings compared to alternative methods.
