Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Diksha Bhandari; Jakiw Pidstrigach; Sebastian Reich

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Diksha Bhandari, Jakiw Pidstrigach, Sebastian Reich

TL;DR

The paper develops two affine-invariant, ensemble Kalman filter–based interacting particle systems to perform Bayesian inference for logistic regression and to quantify predictive uncertainty in neural networks. By deriving mean-field limits and proving quantitative convergence rates, the authors provide a rigorous foundation for using these IPSs as scalable, derivative-free Bayesian samplers. Empirical results on binary and multiclass classification, including out-of-distribution data and CIFAR-10, show that last-layer Bayesian approximations yield well-calibrated uncertainty without sacrificing accuracy, with the deterministic second-order sampler offering faster convergence than traditional HMC. The work advances uncertainty quantification in neural networks by combining affine-invariant transforms with mean-field analysis, enabling robust predictions under distributional shifts.

Abstract

We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks.

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

TL;DR

Abstract

Paper Structure (30 sections, 10 theorems, 131 equations, 7 figures, 2 tables)

This paper contains 30 sections, 10 theorems, 131 equations, 7 figures, 2 tables.

Introduction
Classification and logistic regression
Literature review
Outline and our contribution
Dynamical system formulations
Homotopy using moment matching
Deterministic second-order dynamical sampler
Theoretical results on mean-field limits
Analysing the mean-field systems
Homotopy using moment matching
Deterministic second-order dynamical sampler
Statement of results
Algorithmic implementation
Homotopy using moment matching
Deterministic second-order dynamical sampler
...and 15 more sections

Key Result

Proposition 1

Assume that $\mu_s = \mathcal{N}(m_s, P_s)$ is Gaussian. Then, the right-hand sides of equ:stochastic_noise_pde and equ:deterministic_noise_pde coincide.

Figures (7)

Figure 1: 2D binary classification data set
Figure 2: Binary classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification while black line represents the decision boundary obtained for the toy classification problem.
Figure 3: Zoomed-out versions of the results in Figure 2 for binary classification on a toy data set using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification.
Figure 4: Confidence of MLE, ensembles of neural networks, last-layer Laplace approximation, HMC, moment matching method, and deterministic second-order dynamical sampler as functions of $\delta$ over the test set. Thick blue lines and shades correspond to means and ± standard deviations, respectively. Dashed black lines signify the desirable confidence for $\delta$ sufficiently high.
Figure 6: Multi-class classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, and (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification obtained for the toy classification problem.
...and 2 more figures

Theorems & Definitions (25)

Remark 2.1
Remark 2.2
Remark 3.1
Proposition 1
proof
Remark 3.2
Proposition 2
Theorem 3.3
proof
Proposition 3
...and 15 more

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

TL;DR

Abstract

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (25)