Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Maohao Shen; J. Jon Ryu; Soumya Ghosh; Yuheng Bu; Prasanna Sattigeri; Subhro Das; Gregory W. Wornell

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Maohao Shen, J. Jon Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory W. Wornell

TL;DR

This investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.

Abstract

This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity.

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

TL;DR

Abstract

Paper Structure (45 sections, 4 theorems, 38 equations, 14 figures, 4 tables)

This paper contains 45 sections, 4 theorems, 38 equations, 14 figures, 4 tables.

Introduction
Related Work
Problem Setting and Preliminaries
New Taxonomy for EDL Methods
Criterion 1. Parametric Form of Meta Distribution.
Criterion 2. Objective Function.
Rethinking the Success of EDL Methods
What Is the "Optimal" Meta Distribution Characterized By The EDL Objectives?
EDL Methods Are EBM-Based OOD Detector Rather Than Uncertainty Quantifier
Are EDL Methods Robust for OOD Detection?
EDL Methods Will Benefit from Incorporating Model Uncertainty
Comprehensive Empirical Evaluation
Concluding Remarks
In-Depth Review of Recent Critiques on EDL Methods
Review of (Bengs et al., 2022) "Pitfalls of Epistemic Uncertainty Quantification Through Loss Minimisation"
...and 30 more sections

Key Result

Theorem 4.1

Let $p(\boldsymbol{\pi})=\mathsf{Dir}(\boldsymbol{\pi};\mathds{1}_C)$.

Figures (14)

Figure 1: Behavior of Uncertainties Learned by EDL methods on Real Data. (a) EDL methods learn spurious epistemic uncertainty, wherein uncertainty does not vanish with an increasing number of observed samples, contrary to the fundamental definition of epistemic uncertainty. (b) Instead of a constant, EDL methods learn model-dependent aleatoric uncertainty that depends on hyper-parameter $\lambda$, contrary to the fundamental definition of aleatoric uncertainty. Similar behavior holds for 2D Gaussian data (see Figure \ref{['fig:inconsistency_gauss']} in Appendix \ref{['app:subsec:Gaussian']}).
Figure 2: OOD Detection Performance v.s. Hyper-parameter $\lambda$ on CIFAR10. The $x$-axis represents the increasing $\lambda$ value, and the y-axis represents the Average AUROC score of OOD detection tasks. EDL Methods' uncertainty quantification performance are sensitive to hyper-parameter $\lambda$, while generally benefit from small $\lambda$.
Figure 3: Comparison of Different EDL Methods on OOD Detection. Distillation based methods, including new proposed Bootstrap-Distill method, demonstrate clear advantage over other classical EDL methods. Similar behavior holds for selective classification task.
Figure 4: Comparison of Different EDL Methods on Selective Classification. Distillation based methods, including new proposed Bootstrap-Distill method, demonstrate clear advantage over other classical EDL methods.
Figure 5: Behavior of Uncertainties Learned by EDL methods on Toy Data. (a) EDL methods learn spurious epistemic uncertainty, wherein uncertainty does not vanish with an increasing number of observed samples, contrary to the fundamental definition of epistemic uncertainty. (b) Instead of a constant, EDL methods learn model-dependent aleatoric uncertainty that depends on hyper-parameter $\lambda$, contrary to the fundamental definition of aleatoric uncertainty.
...and 9 more figures

Theorems & Definitions (7)

Theorem 4.1: Unifying EDL Objectives for Classification
Theorem 5.1
Example 5.2: Categorical likelihood
Lemma D.1
proof
Theorem E.1
proof

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

TL;DR

Abstract

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (7)