Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

Kaizheng Wang; Fabio Cuzzolin; Keivan Shariatmadar; David Moens; Hans Hallez

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

Kaizheng Wang, Fabio Cuzzolin, Keivan Shariatmadar, David Moens, Hans Hallez

TL;DR

The paper introduces a credal wrapper to improve uncertainty estimation in classification by converting a limited set of single predictive distributions from Bayesian neural networks (BNNs) or deep ensembles (DEs) into per-class probability intervals, forming a credal set. A single intersection probability $p^*$, derived via $p^*_k = p_{L_k} + \alpha(p_{U_k}-p_{L_k})$ with $\alpha = (1-\sum_k p_{L_k})/(\sum_k (p_{U_k}-p_{L_k}))$, maps the credal set back to a definitive prediction, enabling robust out-of-distribution detection. The framework uses probability intervals to quantify epistemic uncertainty and employs generalized entropy to compute upper/lower uncertainty measures, with a practical Probability Interval Approximation (PIA) to reduce computation for high-class problems. Empirical results across multiple datasets and architectures show that the credal wrapper yields improved uncertainty quantification and calibration (lower ECE) on corrupted data and enhanced OOD detection relative to standard BNN/DE baselines and evidential methods. Overall, the approach provides a principled, plug-and-play method to better capture epistemic uncertainty in classification tasks when only a limited ensemble of predictive distributions is available, albeit with higher computational requirements.

Abstract

This paper presents an innovative approach, called credal wrapper, to formulating a credal set representation of model averaging for Bayesian neural networks (BNNs) and deep ensembles (DEs), capable of improving uncertainty estimation in classification tasks. Given a finite collection of single predictive distributions derived from BNNs or DEs, the proposed credal wrapper approach extracts an upper and a lower probability bound per class, acknowledging the epistemic uncertainty due to the availability of a limited amount of distributions. Such probability intervals over classes can be mapped on a convex set of probabilities (a credal set) from which, in turn, a unique prediction can be obtained using a transformation called intersection probability transformation. In this article, we conduct extensive experiments on several out-of-distribution (OOD) detection benchmarks, encompassing various dataset pairs (CIFAR10/100 vs SVHN/Tiny-ImageNet, CIFAR10 vs CIFAR10-C, CIFAR100 vs CIFAR100-C and ImageNet vs ImageNet-O) and using different network architectures (such as VGG16, ResNet-18/50, EfficientNet B2, and ViT Base). Compared to the BNN and DE baselines, the proposed credal wrapper method exhibits superior performance in uncertainty estimation and achieves a lower expected calibration error on corrupted data.

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

TL;DR

, derived via

with

, maps the credal set back to a definitive prediction, enabling robust out-of-distribution detection. The framework uses probability intervals to quantify epistemic uncertainty and employs generalized entropy to compute upper/lower uncertainty measures, with a practical Probability Interval Approximation (PIA) to reduce computation for high-class problems. Empirical results across multiple datasets and architectures show that the credal wrapper yields improved uncertainty quantification and calibration (lower ECE) on corrupted data and enhanced OOD detection relative to standard BNN/DE baselines and evidential methods. Overall, the approach provides a principled, plug-and-play method to better capture epistemic uncertainty in classification tasks when only a limited ensemble of predictive distributions is available, albeit with higher computational requirements.

Abstract

Paper Structure (19 sections, 20 equations, 17 figures, 19 tables, 1 algorithm)

This paper contains 19 sections, 20 equations, 17 figures, 19 tables, 1 algorithm.

Introduction
Uncertainty Estimation in Different Model Classes
Methodology
Experimental Validation
Conclusion and Future Work
Additional Experiments
Visualization of Credal Set Predictions and Intersection Probability
Ablation Study on Numbers of Predictive Samples in DEs
Ablation Study on Numbers of Predictive Samples in BNNs
Generalized Hartley Measure for EU Estimation of Credal Wrapper
Total Uncertainty Estimation Evaluation on OOD Detection Benchmarks
Ablation Study on Overconfidence Regime
Qualitative Evaluation of Uncertainty Estimation of Credal Wrapper
Evaluation of Intersection Probability on Corrupted Samples using NLL Metric
Additional Results on Computational Cost
...and 4 more sections

Figures (17)

Figure 1: Credal wrapper framework for a three-class (A, B, D) classification task. Given a set of individual probability distributions (denoted as single dots) in the simplex (triangle) of probability distributions of the classes, probability intervals (parallel lines) are derived by extracting the upper and lower probability bounds per class, using \ref{['Eq: ExtractPI']}. Such lower and upper probability intervals induce a credal set on {A, B, D} (${\mathbb{P}}$, light blue convex hull in the triangle). A single intersection probability (the red dot) is computed from the credal set using the transform in \ref{['Eq: ExtractPI']}. Uncertainty is estimated in the mathematical framework of credal sets in \ref{['Eq: CreUncertainty']}.
Figure 2: OOD detection using EU as the metric on CIFAR10 vs CIFAR10-C of the classical and credal wrapper version of BNNs and DE, and EDD against increased corruption intensity, using VGG16 and ResNet-18 as backbones.
Figure 3: ECE values of BNNR, BNNF, and DE on CIFAR10-C against increased corruption intensity, using the averaged probability (Prob.) and our proposed intersection probability (Prob.). VGG16 and ResNet-18 are backbones. Results are from 15 runs.
Figure 4: OOD detection performance of the classical and credal wrapper version of DEs using EU as the metric on CIFAR10/100 vs CIFAR10-C/100-C against increased corruption intensity, using ResNet-50, EffB2, and ViT-B as backbones.
Figure 5: ECE values of DEs on CIFAR10-C and CIFAR100-C against increased corruption intensity, using the averaged probability (Prob.) and our proposed intersection probability (Prob.). ResNet-50, EffB2, and ViT-B are backbones. Results are from 15 runs.
...and 12 more figures

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

TL;DR

Abstract

Credal Wrapper of Model Averaging for Uncertainty Estimation in Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (17)