Table of Contents
Fetching ...

Learning Credal Ensembles via Distributionally Robust Optimization

Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez

TL;DR

CreDRO reframes epistemic uncertainty as model disagreement caused by varying relaxations of the i.i.d. assumption between training and test data, and learns an ensemble under distributionally robust optimization to capture this uncertainty. It then converts ensemble predictions into class-wise probability intervals forming a box credal set, whose uncertainty is quantified via the difference between upper and lower Shannon entropy. Empirically, CreDRO outperforms state-of-the-art credal methods and deep ensembles on OOD detection, corrupted-data robustness, and medical selective classification, while remaining efficient and scalable. This approach advances robust uncertainty quantification by integrating DRO-driven diversity with tractable credal predictions, enabling reliable decision-making under distributional shifts.

Abstract

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

Learning Credal Ensembles via Distributionally Robust Optimization

TL;DR

CreDRO reframes epistemic uncertainty as model disagreement caused by varying relaxations of the i.i.d. assumption between training and test data, and learns an ensemble under distributionally robust optimization to capture this uncertainty. It then converts ensemble predictions into class-wise probability intervals forming a box credal set, whose uncertainty is quantified via the difference between upper and lower Shannon entropy. Empirically, CreDRO outperforms state-of-the-art credal methods and deep ensembles on OOD detection, corrupted-data robustness, and medical selective classification, while remaining efficient and scalable. This approach advances robust uncertainty quantification by integrating DRO-driven diversity with tractable credal predictions, enabling reliable decision-making under distributional shifts.

Abstract

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.
Paper Structure (26 sections, 18 equations, 12 figures, 11 tables, 1 algorithm)

This paper contains 26 sections, 18 equations, 12 figures, 11 tables, 1 algorithm.

Figures (12)

  • Figure 1: CreDRO Concept. ① Training: An ensemble is trained using distributionally robust optimization with members weighted differently to simulate varying degrees of train-test distribution shifts (see Section \ref{['Subsec: Training Process']}). ② Inference: Softmax probabilities are converted into class-wise probability intervals, creating a box credal set (see Section \ref{['Subsec: MappingCedalPrediction']}).
  • Figure 2: AUROC (%) for OOD detection using EU, across methods and ensemble sizes.
  • Figure 3: Kernel density plots of EU estimates on ID and OOD data from distinct methods. (ensemble size: $M=5$; first-run results)
  • Figure 4: OOD detection score comparison across increasing levels of corruption. Results are averaged from $15$ runs.
  • Figure 5: AR (top) and normalized AR (bottom) curves, along with the average AUC values. Note that the accuracy is set to $100\%$ when the rejection rate reaches $1.0$.
  • ...and 7 more figures