Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

H. Martin Gillis; Isaac Xu; Thomas Trappenberg

Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

H. Martin Gillis, Isaac Xu, Thomas Trappenberg

TL;DR

Variance-Gated Ensembles (VGE), an intuitive, differentiable framework that injects epistemic sensitivity via a signal-to-noise gate computed from ensemble statistics, is introduced, providing a practical and scalable approach to epistemic-aware uncertainty estimation in ensemble models.

Abstract

Machine learning applications require fast and reliable per-sample uncertainty estimation. A common approach is to use predictive distributions from Bayesian or approximation methods and additively decompose uncertainty into aleatoric (i.e., data-related) and epistemic (i.e., model-related) components. However, additive decomposition has recently been questioned, with evidence that it breaks down when using finite-ensemble sampling and/or mismatched predictive distributions. This paper introduces Variance-Gated Ensembles (VGE), an intuitive, differentiable framework that injects epistemic sensitivity via a signal-to-noise gate computed from ensemble statistics. VGE provides: (i) a Variance-Gated Margin Uncertainty (VGMU) score that couples decision margins with ensemble predictive variance; and (ii) a Variance-Gated Normalization (VGN) layer that generalizes the variance-gated uncertainty mechanism to training via per-class, learnable normalization of ensemble member probabilities. We derive closed-form vector-Jacobian products enabling end-to-end training through ensemble sample mean and variance. VGE matches or exceeds state-of-the-art information-theoretic baselines while remaining computationally efficient. As a result, VGE provides a practical and scalable approach to epistemic-aware uncertainty estimation in ensemble models. An open-source implementation is available at: https://github.com/nextdevai/vge.

Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

TL;DR

Abstract

Paper Structure (54 sections, 12 theorems, 46 equations, 14 figures, 14 tables)

This paper contains 54 sections, 12 theorems, 46 equations, 14 figures, 14 tables.

Introduction
Related Work
Framework Definition and Setup
Ensemble Statistics and Variance Gate Definition
Geometric Interpretation
Variance-Gated Margin Uncertainty
Variance-Gated Uncertainty Decomposition
Analytical Gradients for Variance-Gated Normalization
Full Gradient Decomposition
(a) Forward computation
(b) Backpropagation path
Experiments
Rank Consistency and Alignment
Uncertainty Mass Concentration
Margin-Variance Geometry
...and 39 more sections

Key Result

Proposition 3.1

For the exponential gate $\boldsymbol{\Gamma} = 1 - e^{-\mathbf{\bar{p}}/\mathbf{ks}}$, the per-class derivative with respect to mean confidence $\partial\boldsymbol{\Gamma}/\partial\mathbf{\bar{p}} > 0$ and scales inversely with predictive spread $\mathbf{s}$.

Figures (14)

Figure 1: Forward and backward passes of VGN. Panel (a) displays the forward computation, in which ensemble predictions are modulated by a shared variance gate and combined into a mixture distribution. Panel (b) displays the backpropagation path, showing how gradients propagate through the normalization layer and shared gate via ensemble mean and predictive spread. See below for further step-by-step discussion.
Figure 2: Rank consistency between VGMU and EPJS on CIFAR-10 (a) and CIFAR-100 (b) for the LLE models. The diagonal denotes perfect agreement between uncertainty rankings.
Figure 3: AUCc curves for CIFAR-10/100. The diagonal (AUC$_c=0.5$), corresponds to no concentration on difficult samples.
Figure 4: Margin-variance geometry for CIFAR-100 with $M=5$. Each point represents a test sample; color indicates VGMU value (yellow = high uncertainty, blue = low). DE and DE-VGN show high variance even at large margins, LLE and LLE-VGN show moderate variance, while MCD and MCD-LLE concentrates samples in the high-margin (confident), low-variance (certain) region.
Figure 5: Learned per-class $\mathbf{k}$ values for VGN models on CIFAR-10. DE-VGN learns higher values ($\bar{k} \approx 4.1$) than LLE-VGN ($\bar{k} \approx 0.8$), reflecting adaptation to ensemble diversity.
...and 9 more figures

Theorems & Definitions (33)

Proposition 3.1: Sensitivity to sample mean confidence $\mathbf{\bar{p}}$
proof
Remark 3.1.1
Proposition 3.2: Sensitivity to predictive spread $\mathbf{s}$
proof
Remark 3.2.1
Proposition 3.3: Sensitivity to scalar $\mathbf{k}$
proof
Remark 3.3.1
Proposition 4.1
...and 23 more

Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

TL;DR

Abstract

Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (14)

Theorems & Definitions (33)