Generative Conformal Prediction with Vectorized Non-Conformity Scores
Minxing Zheng, Shixiang Zhu
TL;DR
This work addresses the conservatism of conformal prediction in multi-dimensional settings by introducing Generative Conformal Prediction with Vectorized Non-Conformity Scores (GCP-VCR). It leverages a generative model to sample multiple conditional predictions, forms a vector of non-conformity scores across ranked samples, and optimizes rank-specific quantiles to create density-adaptive uncertainty balls. The approach comes with theoretical validity guarantees and demonstrates superior efficiency over state-of-the-art baselines on synthetic, MNIST-like, and real datasets, especially in multimodal and complex distributions. Overall, GCP-VCR delivers more flexible, data-adaptive uncertainty sets while preserving the guaranteed coverage, enabling more informative decision-making in high-stakes or complex predictive tasks.
Abstract
Conformal prediction (CP) provides model-agnostic uncertainty quantification with guaranteed coverage, but conventional methods often produce overly conservative uncertainty sets, especially in multi-dimensional settings. This limitation arises from simplistic non-conformity scores that rely solely on prediction error, failing to capture the prediction error distribution's complexity. To address this, we propose a generative conformal prediction framework with vectorized non-conformity scores, leveraging a generative model to sample multiple predictions from the fitted data distribution. By computing non-conformity scores across these samples and estimating empirical quantiles at different density levels, we construct adaptive uncertainty sets using density-ranked uncertainty balls. This approach enables more precise uncertainty allocation -- yielding larger prediction sets in high-confidence regions and smaller or excluded sets in low-confidence regions -- enhancing both flexibility and efficiency. We establish theoretical guarantees for statistical validity and demonstrate through extensive numerical experiments that our method outperforms state-of-the-art techniques on synthetic and real-world datasets.
