Table of Contents
Fetching ...

Credal Ensemble Distillation for Uncertainty Quantification

Kaizheng Wang, Fabio Cuzzolin, David Moens, Hans Hallez

TL;DR

This work tackles the high computational cost of deep ensembles for predictive uncertainty by introducing credal ensemble distillation (CED), which compresses a DE of $M$ SNNs into a single model, CREDIT, that outputs class-wise probability intervals forming a credal set. CREDIT predicts an intersection probability $p_S^{*}$ along with interval lengths and a weight $eta_S$, enabling both accurate class prediction and principled uncertainty quantification via upper/lower entropies over the credal set. The training uses a distillation loss that preserves the ensemble’s predictive performance while transferring the credal information, and uncertainty is quantified using generalized entropy measures; empirically, CED achieves superior or competitive uncertainty estimation (especially for epistemic uncertainty) with much lower inference overhead than running the full DE, across multiple datasets and backbones. This approach offers a scalable, principled alternative for uncertainty quantification in neural classifiers with practical impact on OOD detection and reliability.

Abstract

Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness and reliability. However, their high computational and memory costs during inference pose significant challenges for wide practical deployment. To overcome this issue, we propose credal ensemble distillation (CED), a novel framework that compresses a DE into a single model, CREDIT, for classification tasks. Instead of a single softmax probability distribution, CREDIT predicts class-wise probability intervals that define a credal set, a convex set of probability distributions, for uncertainty quantification. Empirical results on out-of-distribution detection benchmarks demonstrate that CED achieves superior or comparable uncertainty estimation compared to several existing baselines, while substantially reducing inference overhead compared to DE.

Credal Ensemble Distillation for Uncertainty Quantification

TL;DR

This work tackles the high computational cost of deep ensembles for predictive uncertainty by introducing credal ensemble distillation (CED), which compresses a DE of SNNs into a single model, CREDIT, that outputs class-wise probability intervals forming a credal set. CREDIT predicts an intersection probability along with interval lengths and a weight , enabling both accurate class prediction and principled uncertainty quantification via upper/lower entropies over the credal set. The training uses a distillation loss that preserves the ensemble’s predictive performance while transferring the credal information, and uncertainty is quantified using generalized entropy measures; empirically, CED achieves superior or competitive uncertainty estimation (especially for epistemic uncertainty) with much lower inference overhead than running the full DE, across multiple datasets and backbones. This approach offers a scalable, principled alternative for uncertainty quantification in neural classifiers with practical impact on OOD detection and reliability.

Abstract

Deep ensembles (DE) have emerged as a powerful approach for quantifying predictive uncertainty and distinguishing its aleatoric and epistemic components, thereby enhancing model robustness and reliability. However, their high computational and memory costs during inference pose significant challenges for wide practical deployment. To overcome this issue, we propose credal ensemble distillation (CED), a novel framework that compresses a DE into a single model, CREDIT, for classification tasks. Instead of a single softmax probability distribution, CREDIT predicts class-wise probability intervals that define a credal set, a convex set of probability distributions, for uncertainty quantification. Empirical results on out-of-distribution detection benchmarks demonstrate that CED achieves superior or comparable uncertainty estimation compared to several existing baselines, while substantially reducing inference overhead compared to DE.

Paper Structure

This paper contains 24 sections, 20 equations, 11 figures, 6 tables, 1 algorithm.

Figures (11)

  • Figure 1: CED framework for three-class classification ($C \!=\!3$). Given an ensemble teacher composed of $M$ SNNs, the predicted probabilities can generate class-wise probability bounds via a credal wrapper (see Sec. \ref{['subsec: CredalWrapper']}). These intervals form a credal set for UQ, from which a unique intersection probability is extracted for class prediction. As described in Sec. \ref{['subsec: CredalStudent']}, the proposed credal student is designed to output a vector $\boldsymbol{v}\!:=\!(\boldsymbol{p}_S^{*}\!\in\!\mathbb{R}^C \!, {\Delta\boldsymbol{p}}_S \!\in\!\mathbb{R}^C \!, \beta_S \!\in\!\mathbb{R})$, each component representing the intersection probability, the interval length vector, and the weight factor, respectively. The student is trained using a novel distillation loss introduced in Sec. \ref{['subsec: DistillationStrategy']}. At inference time, $\boldsymbol{p}_S^{*}$ is employed for class prediction, while $\boldsymbol{v}$ can recover a credal set $\mathbb{Q}_S$ for UQ.
  • Figure 2: OOD detection (CIFAR10 vs. CIFAR10-C) comparison over increased corruption levels on various backbones.
  • Figure 3: Distributions of EU and TU estimates across models on the VGG16 (top) and ResNet50 (bottom). 15 runs.
  • Figure 4: OOD detection performance with increasing ensemble sizes of the DE teacher. Left: CIFAR10 vs. SVHN. Right: CIFAR10 vs. CIFAR10-C. Backbone: VGG16.
  • Figure 5: OOD detection performance over increased temperature $T$ values. Backbone: VGG16.
  • ...and 6 more figures