Table of Contents
Fetching ...

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

Paul Hofman, Yusuf Sale, Eyke Hüllermeier

TL;DR

This work introduces a loss-based framework for uncertainty quantification by decomposing strictly proper scoring rules into a divergence $D_\ell(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta})$ and an entropy term $H_\ell(\boldsymbol{\theta})$, enabling a spectrum of total, aleatoric, and epistemic uncertainties that can be instantiated with different losses. It argues that aligning the uncertainty measure with the downstream task loss is crucial, and formalizes this alignment in the context of selective prediction via an area-under-the-loss-rejection-curve result. Empirically, the approach demonstrates that mutual information (log-loss) is strongest for OoD detection, while a zero-one-based epistemic uncertainty measure excels in active learning, and that total uncertainty is the appropriate rejection criterion for selective prediction when task loss is considered. The findings suggest there is no universal uncertainty measure; instead, practitioners should tailor the uncertainty construct to the specific predictive task to maximize performance.

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

TL;DR

This work introduces a loss-based framework for uncertainty quantification by decomposing strictly proper scoring rules into a divergence and an entropy term , enabling a spectrum of total, aleatoric, and epistemic uncertainties that can be instantiated with different losses. It argues that aligning the uncertainty measure with the downstream task loss is crucial, and formalizes this alignment in the context of selective prediction via an area-under-the-loss-rejection-curve result. Empirically, the approach demonstrates that mutual information (log-loss) is strongest for OoD detection, while a zero-one-based epistemic uncertainty measure excels in active learning, and that total uncertainty is the appropriate rejection criterion for selective prediction when task loss is considered. The findings suggest there is no universal uncertainty measure; instead, practitioners should tailor the uncertainty construct to the specific predictive task to maximize performance.

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

Paper Structure

This paper contains 41 sections, 1 theorem, 18 equations, 8 figures, 9 tables.

Key Result

Proposition 4.1

Let $\hat{\bm{\theta}} \in \Delta_K$ be a (first-order) prediction and $\ell \in \mathcal{L}(\Delta_K, \mathcal{Y})$ . Then the expected AULC is minimized by ordering test instances in non-decreasing order of their (instance-wise) expected loss $\mathbb{E}_{y \sim \theta}\bigl[\ell(\hat{\bm{\theta}}

Figures (8)

  • Figure 1: Selective Prediction with different task losses using the total uncertainty component as the rejection criterion. The line shows the mean and the shaded area represents the standard deviation over three runs.
  • Figure 2: Active Learning with different datasets using the epistemic uncertainty component to query new instances. The model is evaluated using the zero-one-loss on the test instances. The line shows the mean and the shaded area represents the standard deviation over three runs.
  • Figure 3: Selective Prediction with different task losses using the aleatoric uncertainty (top row) and epistemic uncertainty (bottom row) component as the rejection criterion. The line shows the mean and the shaded area represents the standard deviation over three runs.
  • Figure 4: Selective Prediction with different task losses using the total uncertainty (top row), aleatoric uncertainty (middle row) and epistemic uncertainty (bottom row) component as the rejection criterion. The line shows the mean and the shaded area represents the standard deviation over three runs.
  • Figure 5: Selective Prediction with different task losses using the aleatoric uncertainty (top row) and epistemic uncertainty (bottom row) component as the rejection criterion. The line shows the mean and the shaded area represents the standard deviation over three runs.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Proposition 4.1
  • proof : Proof of proposition \ref{['prop:arc-tu']}