Table of Contents
Fetching ...

Conformal Structured Prediction

Botong Zhang, Shuo Li, Osbert Bastani

TL;DR

This work extends conformal prediction to structured output spaces by introducing a general framework that outputs structured prediction sets while preserving coverage guarantees. The method parameterizes a family of conformal predictors h_{\tau} via a threshold on a label-scoring distribution and accommodates both marginal and PAC guarantees, using learn-then-test procedures. For DAG-structured label spaces, the approach optimizes the prediction set through an integer programming formulation, enabling compact representations such as coarse-label wrappers or interval unions. Empirically, the framework achieves the desired coverage across five diverse domains while producing significantly smaller, more interpretable prediction sets than baselines, highlighting its practical potential for reliable uncertainty quantification in complex prediction tasks.

Abstract

Conformal prediction has recently emerged as a promising strategy for quantifying the uncertainty of a predictive model; these algorithms modify the model to output sets of labels that are guaranteed to contain the true label with high probability. However, existing conformal prediction algorithms have largely targeted classification and regression settings, where the structure of the prediction set has a simple form as a level set of the scoring function. However, for complex structured outputs such as text generation, these prediction sets might include a large number of labels and therefore be hard for users to interpret. In this paper, we propose a general framework for conformal prediction in the structured prediction setting, that modifies existing conformal prediction algorithms to output structured prediction sets that implicitly represent sets of labels. In addition, we demonstrate how our approach can be applied in domains where the prediction sets can be represented as a set of nodes in a directed acyclic graph; for instance, for hierarchical labels such as image classification, a prediction set might be a small subset of coarse labels implicitly representing the prediction set of all their more fine-descendants. We demonstrate how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.

Conformal Structured Prediction

TL;DR

This work extends conformal prediction to structured output spaces by introducing a general framework that outputs structured prediction sets while preserving coverage guarantees. The method parameterizes a family of conformal predictors h_{\tau} via a threshold on a label-scoring distribution and accommodates both marginal and PAC guarantees, using learn-then-test procedures. For DAG-structured label spaces, the approach optimizes the prediction set through an integer programming formulation, enabling compact representations such as coarse-label wrappers or interval unions. Empirically, the framework achieves the desired coverage across five diverse domains while producing significantly smaller, more interpretable prediction sets than baselines, highlighting its practical potential for reliable uncertainty quantification in complex prediction tasks.

Abstract

Conformal prediction has recently emerged as a promising strategy for quantifying the uncertainty of a predictive model; these algorithms modify the model to output sets of labels that are guaranteed to contain the true label with high probability. However, existing conformal prediction algorithms have largely targeted classification and regression settings, where the structure of the prediction set has a simple form as a level set of the scoring function. However, for complex structured outputs such as text generation, these prediction sets might include a large number of labels and therefore be hard for users to interpret. In this paper, we propose a general framework for conformal prediction in the structured prediction setting, that modifies existing conformal prediction algorithms to output structured prediction sets that implicitly represent sets of labels. In addition, we demonstrate how our approach can be applied in domains where the prediction sets can be represented as a set of nodes in a directed acyclic graph; for instance, for hierarchical labels such as image classification, a prediction set might be a small subset of coarse labels implicitly representing the prediction set of all their more fine-descendants. We demonstrate how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.
Paper Structure (13 sections, 2 theorems, 20 equations, 29 figures, 2 tables)

This paper contains 13 sections, 2 theorems, 20 equations, 29 figures, 2 tables.

Key Result

Theorem 3.1

The estimator $\hat{\tau}(Z,\epsilon)=\tau_{\hat{i}(Z,\phi_{\text{marginal}}^\epsilon)}$ satisfies

Figures (29)

  • Figure 1: (a) Structured prediction sets improve interpretability while maintaining the coverage guarantee. In this example, the standard conformal prediction set (top) is guaranteed to include the true label "balance beam" with high probability, but may be more difficult to interpret for someone without gymnastics knowledge. In contrast, the structured prediction set (bottom) can be more interpretable since it contains only a single coarse-grained label "gymnastic apparatus", while guaranteeing that the true label is descendant of this label in the label hierarchy with high probability. The error level for both the standard conformal prediction and conformal structured prediction is $0.05$ (i.e., the desired coverage level is $0.95$). See Section \ref{['sec:experimental_results']} and Appendix \ref{['sec:additional_qualitative_examples']} for more examples. (b) An overview of our framework. To estimate the conformal predictor parameter $\tau$, our algorithm uses a statistical test $\phi$ designed to either establish marginal or PAC coverage guarantees based on the given calibration set. It iterates until an invalid $\tau_i$ is identified, and returns the last valid threshold $\tau_{i-1}$. This computation assumes given a subroutine to compute the optimal prediction set $\tilde{y}$. In general, any optimizer can be used in conjunction with our framework; for the case where the prediction sets are derived from a DAG structure (including hierarchical labels), we show how the optimization problem can be encoded as an integer program.
  • Figure 2: Prediction set coverage rates for the question answering task, for (a) marginal guarantee, (b) PAC guarantee with fixed $\delta$ and varying $m$, and (c) PAC guarantee with fixed $m$ and varying $\delta$.
  • Figure 3: Prediction set sizes for the question answering task, with the baseline represented by dashed lines, for (a) marginal guarantee, (b) PAC guarantee with fixed $\delta$ and varying $m$, and (c) PAC guarantee with fixed $m$ and varying $\delta$.
  • Figure 4: The two-shot prompt used for our question answering task; here, {question} and {context} are replaced with the corresponding data from each example in our dataset, and {year} is replaced with each year between 1970 and 2020.
  • Figure 5: Illustration of the DAG structure for our question answering task.
  • ...and 24 more figures

Theorems & Definitions (4)

  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof