Table of Contents
Fetching ...

Quantification of Credal Uncertainty: A Distance-Based Approach

Xabier Gonzalez-Garcia, Siu Lun Chau, Julian Rodemann, Michele Caprio, Krikamol Muandet, Humberto Bustince, Sébastien Destercke, Eyke Hüllermeier, Yusuf Sale

Abstract

Credal sets, i.e., closed convex sets of probability measures, provide a natural framework to represent aleatoric and epistemic uncertainty in machine learning. Yet how to quantify these two types of uncertainty for a given credal set, particularly in multiclass classification, remains underexplored. In this paper, we propose a distance-based approach to quantify total, aleatoric, and epistemic uncertainty for credal sets. Concretely, we introduce a family of such measures within the framework of Integral Probability Metrics (IPMs). The resulting quantities admit clear semantic interpretations, satisfy natural theoretical desiderata, and remain computationally tractable for common choices of IPMs. We instantiate the framework with the total variation distance and obtain simple, efficient uncertainty measures for multiclass classification. In the binary case, this choice recovers established uncertainty measures, for which a principled multiclass generalization has so far been missing. Empirical results confirm practical usefulness, with favorable performance at low computational cost.

Quantification of Credal Uncertainty: A Distance-Based Approach

Abstract

Credal sets, i.e., closed convex sets of probability measures, provide a natural framework to represent aleatoric and epistemic uncertainty in machine learning. Yet how to quantify these two types of uncertainty for a given credal set, particularly in multiclass classification, remains underexplored. In this paper, we propose a distance-based approach to quantify total, aleatoric, and epistemic uncertainty for credal sets. Concretely, we introduce a family of such measures within the framework of Integral Probability Metrics (IPMs). The resulting quantities admit clear semantic interpretations, satisfy natural theoretical desiderata, and remain computationally tractable for common choices of IPMs. We instantiate the framework with the total variation distance and obtain simple, efficient uncertainty measures for multiclass classification. In the binary case, this choice recovers established uncertainty measures, for which a principled multiclass generalization has so far been missing. Empirical results confirm practical usefulness, with favorable performance at low computational cost.

Paper Structure

This paper contains 35 sections, 8 theorems, 53 equations, 4 figures, 3 tables, 9 algorithms.

Key Result

Proposition 4.2

Let $\mathcal{F}$ be a uniform class with respect to weak convergence such that $d_{\mathcal{F}}$ is a metric on $\Delta_{K-1}$. Then $\operatorname{TU}_\mathcal{F}(\mathcal{Q})$ and $\operatorname{EU}_\mathcal{F}(\mathcal{Q})$ satisfy A1--A4 and A7; $\underline{\operatorname{AU}}_{\mathcal{F}}(\mat

Figures (4)

  • Figure 1: Geometric illustration of the proposed distance-based framework on the simplex $\Delta_{K-1}$ ($K=3$). Dirac measures $\{\delta_1,\delta_2,\delta_3\}$ represent full certainty. (a) Total uncertainty: distance of $\mathcal{Q}$ to full certainty, measured as the worst-case distance to the nearest vertex $\delta_y$. (b) Aleatoric uncertainty: the distance of a precise predictive distribution $p$ from full certainty, measured by its proximity to the nearest vertex. For a credal prediction $\mathcal{Q}$, aleatoric uncertainty is set-valued, giving a range over $p\in\mathcal{Q}$. (c) Epistemic uncertainty: the imprecision of the credal set $\mathcal{Q}$, quantified as half its maximal diameter, i.e., the largest distance between any two distributions in $\mathcal{Q}$. Distances ($\longleftrightarrow$) are defined via IPMs over $\mathcal{F}$.
  • Figure 2: Accuracy–rejection curves on cifar-10(left) and cifar-100(right) for credal set predictors: (a) Total uncertainty with $\langle \operatorname{TU}_{\mathcal{F}_{\mathrm{TV}}}\rangle$ and $\langle S^*\rangle$; (b) Aleatoric uncertainty with $\langle[\underline{\operatorname{AU}}_{\mathcal{F}_{\mathrm{TV}}},\, \overline{\operatorname{AU}}_{\mathcal{F}_{\mathrm{TV}}}]\rangle$, $\langle S_*\rangle$, and $\langle S^* - GH\rangle$; (c)Epistemic uncertainty with $\langle \operatorname{EU}_{\mathcal{F}_{\mathrm{TV}}}\rangle$, $\langle S^* - S_*\rangle$, and $\langle GH\rangle$. At rejection rate $r$, test points are sorted by uncertainty and the top $r\%$ most-uncertain are discarded; accuracy is computed on the remainder. Solid lines show the mean over seeds; shaded bands denote $\pm 1$ s.d. The AUC of each curve ($\uparrow$ higher is better) is printed next to the corresponding label. Our framework is consistently competitive across both datasets.
  • Figure 3: Selective prediction for credal sets constructed via relative likelihood, on fashion-mnist and svhn. Left panels: Accuracy--rejection (AR) curves for (a) total uncertainty, (b) aleatoric uncertainty, and (c) epistemic uncertainty. Area Under the Curve (AUC, $\uparrow$ higher is better) is reported in each legend. Right panels: Cumulative distribution of uncertainty scores across all test instances. The vertical dashed line marks the median uncertainty value.
  • Figure 4: Selective prediction for credal sets constructed via relative likelihood, including$h_{0.0}$, on fashion-mnist and svhn. Left panels: Accuracy--rejection (AR) curves for (a) total uncertainty, (b) aleatoric uncertainty, and (c) epistemic uncertainty. Area Under the Curve (AUC, $\uparrow$ higher is better) is reported in each legend. Right panels: Cumulative distribution of uncertainty scores across all test instances. The vertical dashed line marks the median uncertainty value. The inclusion of $h_{0.0}$---a deterministic model assigning probability 1 to a fixed class---causes pathological behavior.

Theorems & Definitions (18)

  • Remark 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Proposition 4.4
  • Proposition 4.5
  • Proposition 4.6
  • Corollary 4.7
  • Proposition 4.8
  • Remark 4.9
  • Proposition 4.10
  • ...and 8 more