Table of Contents
Fetching ...

Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

Kaizheng Wang, Yunjia Wang, Fabio Cuzzolin, David Moens, Hans Hallez, Siu Lun Chau

TL;DR

This study presents a controlled comparative study enabling principled, like-for-like evaluation of the two paradigms, showing that meaningful comparison between these seemingly non-comparable frameworks is both feasible and informative, providing insights into how second-order representation choices impact practical uncertainty-aware performance.

Abstract

Epistemic uncertainty in neural networks is commonly modeled using two second-order paradigms: distribution-based representations, which rely on posterior parameter distributions, and set-based representations based on credal sets (convex sets of probability distributions). These frameworks are often regarded as fundamentally non-comparable due to differing semantics, assumptions, and evaluation practices, leaving their relative merits unclear. Empirical comparisons are further confounded by variations in the underlying predictive models. To clarify this issue, we present a controlled comparative study enabling principled, like-for-like evaluation of the two paradigms. Both representations are constructed from the same finite collection of predictive distributions generated by a shared neural network, isolating representational effects from predictive accuracy. Our study evaluates each representation through the lens of 3 uncertainty measures across 8 benchmarks, including selective prediction and out-of-distribution detection, spanning 6 underlying predictive models and 10 independent runs per configuration. Our results show that meaningful comparison between these seemingly non-comparable frameworks is both feasible and informative, providing insights into how second-order representation choices impact practical uncertainty-aware performance.

Set-based v.s. Distribution-based Representations of Epistemic Uncertainty: A Comparative Study

TL;DR

This study presents a controlled comparative study enabling principled, like-for-like evaluation of the two paradigms, showing that meaningful comparison between these seemingly non-comparable frameworks is both feasible and informative, providing insights into how second-order representation choices impact practical uncertainty-aware performance.

Abstract

Epistemic uncertainty in neural networks is commonly modeled using two second-order paradigms: distribution-based representations, which rely on posterior parameter distributions, and set-based representations based on credal sets (convex sets of probability distributions). These frameworks are often regarded as fundamentally non-comparable due to differing semantics, assumptions, and evaluation practices, leaving their relative merits unclear. Empirical comparisons are further confounded by variations in the underlying predictive models. To clarify this issue, we present a controlled comparative study enabling principled, like-for-like evaluation of the two paradigms. Both representations are constructed from the same finite collection of predictive distributions generated by a shared neural network, isolating representational effects from predictive accuracy. Our study evaluates each representation through the lens of 3 uncertainty measures across 8 benchmarks, including selective prediction and out-of-distribution detection, spanning 6 underlying predictive models and 10 independent runs per configuration. Our results show that meaningful comparison between these seemingly non-comparable frameworks is both feasible and informative, providing insights into how second-order representation choices impact practical uncertainty-aware performance.
Paper Structure (16 sections, 17 equations, 10 figures, 9 tables, 3 algorithms)

This paper contains 16 sections, 17 equations, 10 figures, 9 tables, 3 algorithms.

Figures (10)

  • Figure 1: Illustration of our comparative study framework.
  • Figure 2: Statistical significance plots on different selective prediction (a, b) and OOD detection (c, d) benchmarks across different underlying predictive models. A cell is shaded if the measure in the $i$-th row is statistically significantly better than that in the $j$-th column according to a pairwise one-sided Wilcoxon signed-rank test at the 5% significance level. Intra-representation comparisons are shown in blue (distribution-based measures) and orange (credal-based measures), while inter-representation comparisons are shown in green.
  • Figure A.1: An example of the whole slide image (referring to node 2 of patient 017) with ground-truth annotations.
  • Figure A.2: Examples of the SeaShip dataset v.s. instances from the Seaship-C at severity level 3.
  • Figure A.3: Statistical significance plots on different selective prediction (a-d) benchmarks across different underlying predictive models. A cell is shaded if the measure in the $i$-th row is statistically significantly better than that in the $j$-th column according to a pairwise one-sided Wilcoxon signed-rank test at the 5% significance level. Intra-representation comparisons are shown in blue (distribution-based measures) and orange (credal-based measures), while inter-representation comparisons are shown in green.
  • ...and 5 more figures