Table of Contents
Fetching ...

Conformalized Credal Set Predictors

Alireza Javanmardi, David Stutz, Eyke Hüllermeier

TL;DR

This work introduces conformal credal set predictors, combining credal-set representations of uncertainty with conformal prediction to obtain validity guarantees for classification under first-order supervision. It presents two learning paradigms—a first-order probabilistic predictor and a second-order (Dirichlet) predictor—both calibrated via nonconformity scores to produce credal sets that contain the true distribution with high probability, even in the presence of label noise. The approach is evaluated on ChaosNLI with multiple human annotations and synthetic data, demonstrating valid coverage across miscoverage levels and illustrating how nonconformity choices influence efficiency. The framework offers a principled, uncertainty-aware extension to learning under ambiguity, with practical applicability to tasks where multiple interpretations or annotations per instance are available.

Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.

Conformalized Credal Set Predictors

TL;DR

This work introduces conformal credal set predictors, combining credal-set representations of uncertainty with conformal prediction to obtain validity guarantees for classification under first-order supervision. It presents two learning paradigms—a first-order probabilistic predictor and a second-order (Dirichlet) predictor—both calibrated via nonconformity scores to produce credal sets that contain the true distribution with high probability, even in the presence of label noise. The approach is evaluated on ChaosNLI with multiple human annotations and synthetic data, demonstrating valid coverage across miscoverage levels and illustrating how nonconformity choices influence efficiency. The framework offers a principled, uncertainty-aware extension to learning under ambiguity, with practical applicability to tasks where multiple interpretations or annotations per instance are available.

Abstract

Credal sets are sets of probability distributions that are considered as candidates for an imprecisely known ground-truth distribution. In machine learning, they have recently attracted attention as an appealing formalism for uncertainty representation, in particular due to their ability to represent both the aleatoric and epistemic uncertainty in a prediction. However, the design of methods for learning credal set predictors remains a challenging problem. In this paper, we make use of conformal prediction for this purpose. More specifically, we propose a method for predicting credal sets in the classification task, given training data labeled by probability distributions. Since our method inherits the coverage guarantees of conformal prediction, our conformal credal sets are guaranteed to be valid with high probability (without any assumptions on model or distribution). We demonstrate the applicability of our method to natural language inference, a highly ambiguous natural language task where it is common to obtain multiple annotations per example.
Paper Structure (16 sections, 2 theorems, 22 equations, 8 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 2 theorems, 22 equations, 8 figures, 1 table, 1 algorithm.

Key Result

Theorem 4.1

Let $\mathcal{P}$ denote the joint probability distribution on $(X, \Lambda) \in \mathcal{X} \times \Delta^K$. If data points in $\mathcal{D}_\text{calib}$ and $(\boldsymbol{x}_\text{new},\boldsymbol{\lambda}^{\boldsymbol{x}_\text{new}})$ are drawn i.i.d. (exchangeably) from $\mathcal{P}$, then the

Figures (8)

  • Figure 1: For the three-class classification setting, the space of probability distributions can be illustrated by a two-dimensional simplex: each point in the simplex corresponds to a probability distribution so that credal sets can be depicted as regions. The left case corresponds to the special case of a singleton (credal) set, i.e., a precise probability distribution, signifying aleatoric but no epistemic uncertainty. The case in the middle represents partial knowledge with a certain degree of (epistemic) uncertainty about the true distribution, and the right one corresponds to the case of complete ignorance, where nothing is known about the distribution.
  • Figure 2: An illustration of our proposed conformalized credal sets on two instances from the ChaosNLI dataset nie2020what. Green regions indicate credal sets, while the true and the predicted distributions are marked with orange squares and black circles, respectively.
  • Figure 3: Various credal sets obtained for three instances from ChaosNLI dataset nie2020what. The ground truth distributions are denoted by orange squares. Black circles indicate model predictions in cases employing a first-order learner (first four columns). For the last column, utilizing a second-order learner, the predicted second-order distributions are represented through contour plots. The miscoverage rate is $\alpha=0.2$, and the efficiency of each credal set is written below it.
  • Figure 4: Coverage and efficiency results of different nonconformity functions applied on the ChaosNLI dataset nie2020what. The horizontal dashed lines indicate the nominal coverage levels.
  • Figure 5: Coverage and quantile results for synthetic data, where the ground truth distributions are approximated by observing $m$ samples from them. The horizontal dashed line indicates the nominal coverage levels $1-\alpha = 0.9$.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Theorem 4.1
  • Theorem 4.2
  • proof