Table of Contents
Fetching ...

Measuring Orthogonality as the Blind-Spot of Uncertainty Disentanglement

Ivo Pascal de Jong, Andreea Ioana Sburlea, Matthia Sabatelli, Matias Valdenegro-Toro

TL;DR

This work tackles the blind spot in uncertainty disentanglement by insisting on orthogonal separation between aleatoric ($U_a$) and epistemic ($U_e$) uncertainties and introducing Uncertainty Disentanglement Error (UDE) to quantify adherence to both consistency and orthogonality. It analyzes Gaussian Logits and Information Theoretic disentangling, revealing that total uncertainty formulations can mask leakage between sources, and demonstrates that orthogonality is not guaranteed even for state-of-the-art methods. Through controlled experiments manipulating dataset size and label noise across multiple domains and Bayesian UQ approaches, the authors show that Information Theoretic disentangling often yields better consistency and partial orthogonality, particularly for $U_e$, but fails to achieve full orthogonality for $U_a$, especially on large-scale data like ImageNet-1k. The paper proposes UDE as a practical metric for evaluating disentanglement quality and highlights that training regime (from scratch vs pretrained) substantially affects results, emphasizing caution when applying disentangled uncertainties in high-stakes decisions.

Abstract

Aleatoric (data) and epistemic (knowledge) uncertainty are textbook components of Uncertainty Quantification. Jointly estimating these components has been shown to be problematic and non-trivial. As a result, there are multiple ways to disentangle these uncertainties, but current methods to evaluate them are insufficient. We propose that aleatoric and epistemic uncertainty estimates should be orthogonally disentangled - meaning that each uncertainty is not affected by the other - a necessary condition that is often not met. We prove that orthogonality and consistency and necessary and sufficient criteria for disentanglement, and construct Uncertainty Disentanglement Error as a metric to measure these criteria, with further empirical evaluation showing that finetuned models give different orthogonality results than models trained from scratch and that UDE can be optimized for through dropout rate. We demonstrate a Deep Ensemble trained from scratch on ImageNet-1k with Information Theoretic disentangling achieves consistent and orthogonal estimates of epistemic uncertainty, but estimates of aleatoric uncertainty still fail on orthogonality.

Measuring Orthogonality as the Blind-Spot of Uncertainty Disentanglement

TL;DR

This work tackles the blind spot in uncertainty disentanglement by insisting on orthogonal separation between aleatoric () and epistemic () uncertainties and introducing Uncertainty Disentanglement Error (UDE) to quantify adherence to both consistency and orthogonality. It analyzes Gaussian Logits and Information Theoretic disentangling, revealing that total uncertainty formulations can mask leakage between sources, and demonstrates that orthogonality is not guaranteed even for state-of-the-art methods. Through controlled experiments manipulating dataset size and label noise across multiple domains and Bayesian UQ approaches, the authors show that Information Theoretic disentangling often yields better consistency and partial orthogonality, particularly for , but fails to achieve full orthogonality for , especially on large-scale data like ImageNet-1k. The paper proposes UDE as a practical metric for evaluating disentanglement quality and highlights that training regime (from scratch vs pretrained) substantially affects results, emphasizing caution when applying disentangled uncertainties in high-stakes decisions.

Abstract

Aleatoric (data) and epistemic (knowledge) uncertainty are textbook components of Uncertainty Quantification. Jointly estimating these components has been shown to be problematic and non-trivial. As a result, there are multiple ways to disentangle these uncertainties, but current methods to evaluate them are insufficient. We propose that aleatoric and epistemic uncertainty estimates should be orthogonally disentangled - meaning that each uncertainty is not affected by the other - a necessary condition that is often not met. We prove that orthogonality and consistency and necessary and sufficient criteria for disentanglement, and construct Uncertainty Disentanglement Error as a metric to measure these criteria, with further empirical evaluation showing that finetuned models give different orthogonality results than models trained from scratch and that UDE can be optimized for through dropout rate. We demonstrate a Deep Ensemble trained from scratch on ImageNet-1k with Information Theoretic disentangling achieves consistent and orthogonal estimates of epistemic uncertainty, but estimates of aleatoric uncertainty still fail on orthogonality.
Paper Structure (48 sections, 6 theorems, 14 equations, 17 figures, 6 tables)

This paper contains 48 sections, 6 theorems, 14 equations, 17 figures, 6 tables.

Key Result

Theorem 3.1

Evaluating only Consistency (eq:cond1, eq:cond2) is insufficient for disentanglement, as these conditions can be satisfied by a non-disentangled estimator.

Figures (17)

  • Figure 1: Diagram of Gaussian Logits disentangling.
  • Figure 2: Expected behavior for our proposed experimental setup. Experiment 1: As dataset size increases, epistemic uncertainty $U_e$ decreases while aleatoric uncertainty $U_a$ remains stable on average. The change in $U_e$ is captured by the change in accuracy when this is caused by dataset size. Experiment 2: With increasing label noise, $U_a$ rises while $U_e$ remains relatively stable, reflecting the model’s awareness of inherent data noise. Change in $U_a$ is captured by accuracy when this is caused by label noise.
  • Figure 3: Changing dataset size on CIFAR-10 for different UQ methods and different disentanglement approaches. As the dataset increases (x-axis), accuracy (right y-axis) increases as well. This should result in decreased epistemic uncertainty $u_e$ (left y-axis), but this does not always happen. Aleatoric uncertainty $u_a$ (left y-axis) should stay constant, but it usually increases. The shaded areas indicate 2 standard deviations.
  • Figure 4: Changing label noise on CIFAR-10 for different UQ methods and disentanglement approaches. As more labels get shuffled (x-axis) the accuracy goes down (right y-axis). This should increase the aleatoric uncertainty $u_a$(left y-axis), and have minimal effect on the epistemic uncertainty $u_e$ (left y-axis). The shaded areas indicate two standard deviations.
  • Figure 5: Aleatoric $u_a$ and epistemic uncertainty $u_e$ with (a) changing dataset sizes or (b) changing label noise for the Two Moons dataset with MC-Dropout. The lighter areas represent higher uncertainty. By visualizing the uncertainty for the whole feature space, we can gain intuition about uncertainty outside the dataset. Gaussian Logits gives qualitatively different results than Information Theoretic.
  • ...and 12 more figures

Theorems & Definitions (9)

  • Theorem 3.1: The Total Uncertainty Trap
  • Theorem 3.2: Necessity of Correlation
  • Theorem 3.3: Fundamental Disentanglement
  • Theorem 1.1: The Total Uncertainty Trap
  • proof
  • Theorem 1.2: Necessity and Sufficiency
  • proof
  • Theorem 1.3: Correlation is Necessary but Not Sufficient
  • proof