Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

Paul Hofman; Yusuf Sale; Eyke Hüllermeier

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

Paul Hofman, Yusuf Sale, Eyke Hüllermeier

TL;DR

This work introduces a loss-based framework for uncertainty quantification by decomposing strictly proper scoring rules into a divergence $D_\ell(\hat{\boldsymbol{\theta}}, \boldsymbol{\theta})$ and an entropy term $H_\ell(\boldsymbol{\theta})$, enabling a spectrum of total, aleatoric, and epistemic uncertainties that can be instantiated with different losses. It argues that aligning the uncertainty measure with the downstream task loss is crucial, and formalizes this alignment in the context of selective prediction via an area-under-the-loss-rejection-curve result. Empirically, the approach demonstrates that mutual information (log-loss) is strongest for OoD detection, while a zero-one-based epistemic uncertainty measure excels in active learning, and that total uncertainty is the appropriate rejection criterion for selective prediction when task loss is considered. The findings suggest there is no universal uncertainty measure; instead, practitioners should tailor the uncertainty construct to the specific predictive task to maximize performance.

Abstract

We address the problem of uncertainty quantification and propose measures of total, aleatoric, and epistemic uncertainty based on a known decomposition of (strictly) proper scoring rules, a specific type of loss function, into a divergence and an entropy component. This leads to a flexible framework for uncertainty quantification that can be instantiated with different losses (scoring rules), which makes it possible to tailor uncertainty quantification to the use case at hand. We show that this flexibility is indeed advantageous. In particular, we analyze the task of selective prediction and show that the scoring rule should ideally match the task loss. In addition, we perform experiments on two other common tasks. For out-of-distribution detection, our results confirm that a widely used measure of epistemic uncertainty, mutual information, performs best. Moreover, in the setting of active learning, our measure of epistemic uncertainty based on the zero-one-loss consistently outperforms other uncertainty measures.

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

TL;DR

This work introduces a loss-based framework for uncertainty quantification by decomposing strictly proper scoring rules into a divergence

and an entropy term

, enabling a spectrum of total, aleatoric, and epistemic uncertainties that can be instantiated with different losses. It argues that aligning the uncertainty measure with the downstream task loss is crucial, and formalizes this alignment in the context of selective prediction via an area-under-the-loss-rejection-curve result. Empirically, the approach demonstrates that mutual information (log-loss) is strongest for OoD detection, while a zero-one-based epistemic uncertainty measure excels in active learning, and that total uncertainty is the appropriate rejection criterion for selective prediction when task loss is considered. The findings suggest there is no universal uncertainty measure; instead, practitioners should tailor the uncertainty construct to the specific predictive task to maximize performance.

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

TL;DR

Abstract

Uncertainty Quantification with Proper Scoring Rules: Adjusting Measures to Prediction Tasks

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (3)