Table of Contents
Fetching ...

Conformal Prediction: A Data Perspective

Xiaofan Zhou, Baiting Chen, Yu Gui, Lu Cheng

TL;DR

Conformal prediction provides distribution-free uncertainty quantification by producing calibrated prediction sets with finite-sample validity under exchangeability. This survey reframes CP from a data perspective, cataloging static and dynamic data modalities, and reviews foundational CP variants (full, split, weighted) plus conformal risk control, alongside broad evaluation metrics. It surveys CP applications across structured, unstructured, and spatio-temporal data, detailing data-specific nonconformity scores, efficiency considerations, and graph/text/image adaptations. The paper highlights open challenges in non-exchangeable settings, large-scale and streaming data, multi-modal integration, and responsible-AI implications, outlining future research directions for scalable, robust, and adaptable CP methods.

Abstract

Conformal prediction (CP), a distribution-free uncertainty quantification (UQ) framework, reliably provides valid predictive inference for black-box models. CP constructs prediction sets that contain the true output with a specified probability. However, modern data science diverse modalities, along with increasing data and model complexity, challenge traditional CP methods. These developments have spurred novel approaches to address evolving scenarios. This survey reviews the foundational concepts of CP and recent advancements from a data-centric perspective, including applications to structured, unstructured, and dynamic data. We also discuss the challenges and opportunities CP faces in large-scale data and models.

Conformal Prediction: A Data Perspective

TL;DR

Conformal prediction provides distribution-free uncertainty quantification by producing calibrated prediction sets with finite-sample validity under exchangeability. This survey reframes CP from a data perspective, cataloging static and dynamic data modalities, and reviews foundational CP variants (full, split, weighted) plus conformal risk control, alongside broad evaluation metrics. It surveys CP applications across structured, unstructured, and spatio-temporal data, detailing data-specific nonconformity scores, efficiency considerations, and graph/text/image adaptations. The paper highlights open challenges in non-exchangeable settings, large-scale and streaming data, multi-modal integration, and responsible-AI implications, outlining future research directions for scalable, robust, and adaptable CP methods.

Abstract

Conformal prediction (CP), a distribution-free uncertainty quantification (UQ) framework, reliably provides valid predictive inference for black-box models. CP constructs prediction sets that contain the true output with a specified probability. However, modern data science diverse modalities, along with increasing data and model complexity, challenge traditional CP methods. These developments have spurred novel approaches to address evolving scenarios. This survey reviews the foundational concepts of CP and recent advancements from a data-centric perspective, including applications to structured, unstructured, and dynamic data. We also discuss the challenges and opportunities CP faces in large-scale data and models.
Paper Structure (50 sections, 41 equations, 5 figures, 2 tables)

This paper contains 50 sections, 41 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The proposed taxonomy for CP from a data perspective.
  • Figure 2: Example of Split CP for Text Infilling. In this task, the model predicts missing words in a sentence, and Split CP is used to create prediction sets that quantify the uncertainty of these predictions. $V(X_i,Y_i)=1-P(Y_i|X_i).$ where $P$ represents the predictive probability output by the trained base model.
  • Figure 3: Illustration of conformal prediction sets with bivariate response.
  • Figure 4: A taxonomy of CP methods for text data. Loss bound refers to a formal guarantee on the maximum expected loss of a prediction set generated by a conformal method.
  • Figure 5: Venn Diagram for CP Methods in Spatio-Temporal Data.