Table of Contents
Fetching ...

General Uncertainty Estimation with Delta Variances

Simon Schmitt, John Shawe-Taylor, Hado van Hasselt

TL;DR

The paper tackles uncertainty introduced by limited data in large neural networks by introducing Delta Variances, a gradient-based, architecture-free framework for epistemic uncertainty. Delta Variances approximate the posterior or leave-one-out variance of quantities of interest using a simple quadratic form Delta_u(z)^T Sigma Delta_u(z), where Delta_u(z) is the gradient of the QoI with respect to parameters and Sigma is a covariance surrogate (often (1/N)F^{-1}). The authors connect Bayesian, frequentist, adversarial, and out-of-distribution perspectives, provide theoretical motivations, and show that special cases recover known methods such as the Delta Method and Laplace approximation. Empirically, they validate Delta Variances on the GraphCast weather forecasting system, achieving competitive uncertainty estimates with substantially lower inference cost than ensembling, and demonstrate extensions to implicit QoIs and learned Sigma. The work offers a practical, scalable approach to quantifying epistemic uncertainty in complex predictive systems and highlights its adaptability to a range of QoIs and iterative algorithms.

Abstract

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

General Uncertainty Estimation with Delta Variances

TL;DR

The paper tackles uncertainty introduced by limited data in large neural networks by introducing Delta Variances, a gradient-based, architecture-free framework for epistemic uncertainty. Delta Variances approximate the posterior or leave-one-out variance of quantities of interest using a simple quadratic form Delta_u(z)^T Sigma Delta_u(z), where Delta_u(z) is the gradient of the QoI with respect to parameters and Sigma is a covariance surrogate (often (1/N)F^{-1}). The authors connect Bayesian, frequentist, adversarial, and out-of-distribution perspectives, provide theoretical motivations, and show that special cases recover known methods such as the Delta Method and Laplace approximation. Empirically, they validate Delta Variances on the GraphCast weather forecasting system, achieving competitive uncertainty estimates with substantially lower inference cost than ensembling, and demonstrate extensions to implicit QoIs and learned Sigma. The work offers a practical, scalable approach to quantifying epistemic uncertainty in complex predictive systems and highlights its adaptability to a range of QoIs and iterative algorithms.

Abstract

Decision makers may suffer from uncertainty induced by limited data. This may be mitigated by accounting for epistemic uncertainty, which is however challenging to estimate efficiently for large neural networks. To this extent we investigate Delta Variances, a family of algorithms for epistemic uncertainty quantification, that is computationally efficient and convenient to implement. It can be applied to neural networks and more general functions composed of neural networks. As an example we consider a weather simulator with a neural-network-based step function inside -- here Delta Variances empirically obtain competitive results at the cost of a single gradient computation. The approach is convenient as it requires no changes to the neural network architecture or training procedure. We discuss multiple ways to derive Delta Variances theoretically noting that special cases recover popular techniques and present a unified perspective on multiple related methods. Finally we observe that this general perspective gives rise to a natural extension and empirically show its benefit.

Paper Structure

This paper contains 60 sections, 9 theorems, 47 equations, 4 figures, 2 tables.

Key Result

Proposition 1

For a normally distributed posterior with mean ${\bar{\theta}}$ and a covariance matrix $\Sigma$ proportional to $\frac{1}{N}$ it holds: where $\Delta_{u(z)}\vcentcolon= \nabla_\theta u_\theta(z)|_{\theta={\bar{\theta}}}$ as usual.

Figures (4)

  • Figure 1: We compare the computational overhead of training and evaluating different variance estimators. Delta Variances are favourable in terms of computational efficiency. They incur negligible training overhead while inference incurs the cost of a regular gradient pass making them more efficient than the alternatives considered. Monte-Carlo Dropout also incurs negligible training overhead, but requires $K$ independent evaluations for inference. Most expensive are Bootstrapped Ensembles requiring $K\times$ repeated computations.
  • Figure 2: Illustrative survival prediction example. Actual epistemic variance (red) vs. predicted variance using the Delta Variance (orange) or a 10-fold Bootstrap (blue) as the dataset size $N$ grows. Shaded confidence areas contain $95\%$ of the variance predictions. Bold lines are the median. Observe that the orange median line of the Delta Variance and the actual variance in red overlap largely. Top: variance of learned function $f_\theta(x)=\theta$ Bottom: variance of quantity of interest $u_\theta(x) \vcentcolon= \theta^{10}$ evaluations. All methods yield reasonable results for $N>10$ with ensemble methods exhibiting higher variance. Generally the variance for $u_\theta$ is harder to estimate than for $f_\theta$.
  • Figure 3: Comparison of variance estimators in terms of their inference cost and prediction quality. The quantities of interest are based on the GraphCast lam:2023GraphCast weather prediction system that iterates a learned neural network dynamics model to form predictions. We evaluate the selected variance estimators based on three different evaluation criteria (Log-likelihood, correlation to prediction error and AUC akin to Amersfoort:2020UncertaintyDUQ). Lines indicate 2 standard errors. Delta Variances yield similar results as popular alternatives for lower computational cost. On average ensembles achieve the highest quality and Delta Variances the lowest computational overhead. See Section \ref{['sec:delta_learning_sigma']} for the fine-tuned Delta Variance.
  • Figure 4: To investigate more intricate quantities of interest, we consider the mapping from a matrix $A_\theta$ to its eigenvalue $u_\theta=\lambda_i(A_\theta)$. This function is not explicit and computed using iterative algorithms, but we can use the implicit function approach to estimate the Delta Variance. Here $A_\theta$ is an illustrative finite-element problem with 11-dimensional parameters $\theta$ and 5 eigenvalues.

Theorems & Definitions (28)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • proof
  • Definition 4
  • Definition 5
  • Proposition 2
  • proof
  • Definition 6
  • ...and 18 more