Table of Contents
Fetching ...

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

Diksha Bhandari, Jakiw Pidstrigach, Sebastian Reich

TL;DR

The paper develops two affine-invariant, ensemble Kalman filter–based interacting particle systems to perform Bayesian inference for logistic regression and to quantify predictive uncertainty in neural networks. By deriving mean-field limits and proving quantitative convergence rates, the authors provide a rigorous foundation for using these IPSs as scalable, derivative-free Bayesian samplers. Empirical results on binary and multiclass classification, including out-of-distribution data and CIFAR-10, show that last-layer Bayesian approximations yield well-calibrated uncertainty without sacrificing accuracy, with the deterministic second-order sampler offering faster convergence than traditional HMC. The work advances uncertainty quantification in neural networks by combining affine-invariant transforms with mean-field analysis, enabling robust predictions under distributional shifts.

Abstract

We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks.

Affine Invariant Ensemble Transform Methods to Improve Predictive Uncertainty in Neural Networks

TL;DR

The paper develops two affine-invariant, ensemble Kalman filter–based interacting particle systems to perform Bayesian inference for logistic regression and to quantify predictive uncertainty in neural networks. By deriving mean-field limits and proving quantitative convergence rates, the authors provide a rigorous foundation for using these IPSs as scalable, derivative-free Bayesian samplers. Empirical results on binary and multiclass classification, including out-of-distribution data and CIFAR-10, show that last-layer Bayesian approximations yield well-calibrated uncertainty without sacrificing accuracy, with the deterministic second-order sampler offering faster convergence than traditional HMC. The work advances uncertainty quantification in neural networks by combining affine-invariant transforms with mean-field analysis, enabling robust predictions under distributional shifts.

Abstract

We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks.
Paper Structure (30 sections, 10 theorems, 131 equations, 7 figures, 2 tables)

This paper contains 30 sections, 10 theorems, 131 equations, 7 figures, 2 tables.

Key Result

Proposition 1

Assume that $\mu_s = \mathcal{N}(m_s, P_s)$ is Gaussian. Then, the right-hand sides of equ:stochastic_noise_pde and equ:deterministic_noise_pde coincide.

Figures (7)

  • Figure 1: 2D binary classification data set
  • Figure 2: Binary classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification while black line represents the decision boundary obtained for the toy classification problem.
  • Figure 3: Zoomed-out versions of the results in Figure 2 for binary classification on a toy data set using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification.
  • Figure 4: Confidence of MLE, ensembles of neural networks, last-layer Laplace approximation, HMC, moment matching method, and deterministic second-order dynamical sampler as functions of $\delta$ over the test set. Thick blue lines and shades correspond to means and ± standard deviations, respectively. Dashed black lines signify the desirable confidence for $\delta$ sufficiently high.
  • Figure 6: Multi-class classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, and (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification obtained for the toy classification problem.
  • ...and 2 more figures

Theorems & Definitions (25)

  • Remark 2.1
  • Remark 2.2
  • Remark 3.1
  • Proposition 1
  • proof
  • Remark 3.2
  • Proposition 2
  • Theorem 3.3
  • proof
  • Proposition 3
  • ...and 15 more