Efficient Credal Prediction through Decalibration

Paul Hofman; Timo Löhr; Maximilian Muschalik; Yusuf Sale; Eyke Hüllermeier

Efficient Credal Prediction through Decalibration

Paul Hofman, Timo Löhr, Maximilian Muschalik, Yusuf Sale, Eyke Hüllermeier

TL;DR

This work proposes an efficient method for credal prediction that is grounded in the notion of relative likelihood and inspired by techniques for the calibration of probabilistic classifiers, and demonstrates credal prediction on models such as TabPFN and CLIP -- architectures for which the construction of credal sets was previously infeasible.

Abstract

A reliable representation of uncertainty is essential for the application of modern machine learning methods in safety-critical settings. In this regard, the use of credal sets (i.e., convex sets of probability distributions) has recently been proposed as a suitable approach to representing epistemic uncertainty. However, as with other approaches to epistemic uncertainty, training credal predictors is computationally complex and usually involves (re-)training an ensemble of models. The resulting computational complexity prevents their adoption for complex models such as foundation models and multi-modal systems. To address this problem, we propose an efficient method for credal prediction that is grounded in the notion of relative likelihood and inspired by techniques for the calibration of probabilistic classifiers. For each class label, our method predicts a range of plausible probabilities in the form of an interval. To produce the lower and upper bounds of these intervals, we propose a technique that we refer to as decalibration. Extensive experiments show that our method yields credal sets with strong performance across diverse tasks, including coverage-efficiency evaluation, out-of-distribution detection, and in-context learning. Notably, we demonstrate credal prediction on models such as TabPFN and CLIP -- architectures for which the construction of credal sets was previously infeasible.

Efficient Credal Prediction through Decalibration

TL;DR

Abstract

Paper Structure (58 sections, 3 theorems, 23 equations, 15 figures, 8 tables)

This paper contains 58 sections, 3 theorems, 23 equations, 15 figures, 8 tables.

Introduction
Credal Prediction based on Plausible Intervals
Efficient Credal Prediction through Decalibration
Empirical Results
Coverage versus Efficiency
Out-of-Distribution Detection
In-Context Learning with TabPFN
Zero-Shot Classification with CLIP-Based Models
Related Work
Discussion
Reproducibility Statement.
Proofs
Implementation Details
Guide on Interpreting Credal Spider Plots
Experimental Setup
...and 43 more sections

Key Result

Proposition 2.1

If $0<\alpha_2\le \alpha_1\le 1$, then $\mathcal{C}_{\alpha_1}\subseteq \mathcal{C}_{\alpha_2}$ and $\mathcal{Q}_{\bm{x},\alpha_1}\subseteq \mathcal{Q}_{\bm{x},\alpha_2}$. Thus, for all $k$, If a maximum-likelihood estimator $h^{\mathrm{ML}}\in\mathcal{H}$ exists, then $\mathcal{Q}_{\bm{x},1}=\{p_k(\bm{x}, h^{\mathrm{ML}})\}$ and $[\underline p_k(\bm{x};1),\overline p_k(\bm{x};1)]=\{p_k(\bm{x}, h

Figures (15)

Figure 1: Overview of Efficient Credal Prediction through Decalibration. Given a probabilistic classifier (maximum likelihood estimate), our method decalibrates the predicted distributions by their logits. The resulting credal set contains the ground-truth distribution, as visualized in the credal spider plot (see \ref{['app:guide-on-visualization']} for an explanation). Note that we only show the decalibration of three classes for visualization purposes---in practice, all classes are decalibrated.
Figure 2: Coverage versus Efficiency. Comparison on cifar-10 and chaosnli. The plot highlights the Pareto trade-off: higher coverage often requires lower efficiency. EffCre consistently advances the Pareto front over baselines.
Figure 3: Out-of-Distribution Detection. Performance (AUROC, based on epistemic uncertainty) as a function of required number of models and training time (in hours).
Figure 4: EffCre used with TabPFN.Top: Coverage versus efficiency performance all multi-class tabarena datasets. Bottom: Active In-Context Learning performance versus the random baseline.
Figure 4: Training and inference time in seconds for models trained on CIFAR10. Mean with standard deviation over three runs. Computed based on ensembles with 10 members.
...and 10 more figures

Theorems & Definitions (6)

Proposition 2.1
Proposition 3.1
Corollary 3.1
proof : Proof of Proposition \ref{['prop:nestedness']}
proof : Proof of Proposition \ref{['prop:credal-endpoints-multi']}
proof : Proof of Corollary \ref{['cor:credal-endpoints-uni']}

Efficient Credal Prediction through Decalibration

TL;DR

Abstract

Efficient Credal Prediction through Decalibration

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (6)