Table of Contents
Fetching ...

Credal Concept Bottleneck Models: Structural Separation of Epistemic and Aleatoric Uncertainty

Tanmoy Mukherjee, Marius Kloft, Pierre Marquis, Zied Bouraoui

TL;DR

This work addresses the challenge of disentangling epistemic uncertainty (EU) from aleatoric uncertainty (AU) in predictive models. It introduces Credal Concept Bottleneck Models (Credal CBMs) that represent uncertainty as ellipsoidal credal sets parameterized in logit space, with EU determined by the credal set size and AU by within-set noise, and enforces structural separation via a frozen encoder and disjoint gradient signals across three heads. A Credal ELBO incorporating a Hausdorff KL regularizer ensures the credal set remains well-behaved while enabling gradient isolation, with theoretical guarantees of gradient separation and decorrelation. Empirically, the method achieves near-zero EU–AU correlation across multiple datasets (CeBaB, GoEmotions, MAQA*, AmbigQA*), improves alignment of EU with prediction error and AU with ground-truth ambiguity, and supports actionable quadrant-based routing for downstream tasks. The approach offers a principled design principle for trustworthy AI, enabling targeted decisions such as data collection, human review, or abstention based on the source of uncertainty, while outlining practical considerations and limitations for deployment.

Abstract

Decomposing predictive uncertainty into epistemic (model ignorance) and aleatoric (data ambiguity) components is central to reliable decision making, yet most methods estimate both from the same predictive distribution. Recent empirical and theoretical results show these estimates are typically strongly correlated, so changes in predictive spread simultaneously affect both components and blur their semantics. We propose a credal-set formulation in which uncertainty is represented as a set of predictive distributions, so that epistemic and aleatoric uncertainty correspond to distinct geometric properties: the size of the set versus the noise within its elements. We instantiate this idea in a Variational Credal Concept Bottleneck Model with two disjoint uncertainty heads trained by disjoint objectives and non-overlapping gradient paths, yielding separation by construction rather than post hoc decomposition. Across multi-annotator benchmarks, our approach reduces the correlation between epistemic and aleatoric uncertainty by over an order of magnitude compared to standard methods, while improving the alignment of epistemic uncertainty with prediction error and aleatoric uncertainty with ground-truth ambiguity.

Credal Concept Bottleneck Models: Structural Separation of Epistemic and Aleatoric Uncertainty

TL;DR

This work addresses the challenge of disentangling epistemic uncertainty (EU) from aleatoric uncertainty (AU) in predictive models. It introduces Credal Concept Bottleneck Models (Credal CBMs) that represent uncertainty as ellipsoidal credal sets parameterized in logit space, with EU determined by the credal set size and AU by within-set noise, and enforces structural separation via a frozen encoder and disjoint gradient signals across three heads. A Credal ELBO incorporating a Hausdorff KL regularizer ensures the credal set remains well-behaved while enabling gradient isolation, with theoretical guarantees of gradient separation and decorrelation. Empirically, the method achieves near-zero EU–AU correlation across multiple datasets (CeBaB, GoEmotions, MAQA*, AmbigQA*), improves alignment of EU with prediction error and AU with ground-truth ambiguity, and supports actionable quadrant-based routing for downstream tasks. The approach offers a principled design principle for trustworthy AI, enabling targeted decisions such as data collection, human review, or abstention based on the source of uncertainty, while outlining practical considerations and limitations for deployment.

Abstract

Decomposing predictive uncertainty into epistemic (model ignorance) and aleatoric (data ambiguity) components is central to reliable decision making, yet most methods estimate both from the same predictive distribution. Recent empirical and theoretical results show these estimates are typically strongly correlated, so changes in predictive spread simultaneously affect both components and blur their semantics. We propose a credal-set formulation in which uncertainty is represented as a set of predictive distributions, so that epistemic and aleatoric uncertainty correspond to distinct geometric properties: the size of the set versus the noise within its elements. We instantiate this idea in a Variational Credal Concept Bottleneck Model with two disjoint uncertainty heads trained by disjoint objectives and non-overlapping gradient paths, yielding separation by construction rather than post hoc decomposition. Across multi-annotator benchmarks, our approach reduces the correlation between epistemic and aleatoric uncertainty by over an order of magnitude compared to standard methods, while improving the alignment of epistemic uncertainty with prediction error and aleatoric uncertainty with ground-truth ambiguity.
Paper Structure (109 sections, 3 theorems, 30 equations, 9 figures, 19 tables, 2 algorithms)

This paper contains 109 sections, 3 theorems, 30 equations, 9 figures, 19 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $\mathcal{L} = \mathcal{L}_{\emph{task}} + \lambda_c \mathcal{L}_{\emph{concept}} + \lambda_e \mathcal{L}_{\emph{epi}} + \lambda_a \mathcal{L}_{\emph{ale}} + \lambda_o \mathcal{L}_{\emph{orth}}$ with frozen encoder, orthogonal projections, and stop-gradient in $\mathcal{L}_{\emph{epi}}$ (Eq. eq: That is, aleatoric parameters receive gradients only from $\mathcal{L}_{\emph{ale}}$, while epistem

Figures (9)

  • Figure 1: Why standard decomposition fails. Aleatoric uncertainty reflects where $p^*$ lies on the simplex---ambiguous cases cluster near the center (left). Epistemic uncertainty reflects how far $p$ deviates from $p^*$(middle). These are geometrically independent properties, yet methods deriving both from $p$ produce estimates that fall along a diagonal (right, red)---the "algebraic trap." Credal CBM's structural separation recovers the independence (right, blue).
  • Figure 2: Credal CBM architecture. For "Is this movie good?" ($\mathbb{H}[p^*]=1.09$), the credal set lies in the simplex interior and $\sigma_{\mathrm{ale}}=1.05$ reflects annotator disagreement. For factual questions like "What is 2+2?" ($\mathbb{H}[p^*]=0.12$), the credal set would be near a vertex with $\sigma_{\mathrm{ale}}=0.15$.
  • Figure 3: Empirical validation of Theorem \ref{['thm:gradient-separation']}. (a) Training loss convergence. (b) EU--AU correlation over training: baselines maintain strong coupling throughout (red, $\rho > 0.7$), while Credal CBM achieves decorrelation within 10--20 epochs (blue/purple), stabilizing near $\rho \approx 0$. The rapid initial drop reflects structural separation taking effect once heads begin learning. (c) AU--Entropy correlation: with ground-truth supervision on MAQA*, aleatoric uncertainty tracks annotator entropy ($\rho = 0.78$), while EU--AU decorrelation is preserved.
  • Figure 4: Quadrant-based routing enables actionable uncertainty. (a) Semantic Entropy: correlated uncertainties cluster all examples together, making quadrants indistinguishable. (b) Credal CBM: decorrelated uncertainties separate examples by uncertainty type. The separation between Review (ambiguous but predictable) and Data (clear but unknown) demonstrates that the decomposition is actionable, whereas baselines cannot distinguish these cases.
  • Figure 5: $\beta$ sensitivity (CEBaB). (a) Decorrelation: $\rho(U_{\mathrm{epi}}, U_{\mathrm{ale}})$ is minimized for intermediate $\beta$ (green). Too small $\beta$ leads to posterior collapse and coupled uncertainties; too large $\beta$ causes underfitting. (b) Aleatoric validity peaks at moderate $\beta$. (c) Task accuracy remains stable across the optimal range.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 2.1: Structural Separation
  • Theorem 3.1: Gradient Separation
  • proof
  • Remark 3.2: Optional Decorrelation Penalty
  • Corollary 3.3: Asymptotic Decorrelation
  • proof : Proof Sketch
  • Remark 3.4
  • Proposition 4.1: Closed-Form Hausdorff KL
  • proof
  • proof
  • ...and 1 more