Table of Contents
Fetching ...

Stein Variational Newton Neural Network Ensembles

Klemens Flöge, Mohammed Abdul Moeed, Vincent Fortuin

TL;DR

This work proposes a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates, which uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations.

Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

Stein Variational Newton Neural Network Ensembles

TL;DR

This work proposes a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates, which uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations.

Abstract

Deep neural network ensembles are powerful tools for uncertainty quantification, which have recently been re-interpreted from a Bayesian perspective. However, current methods inadequately leverage second-order information of the loss landscape, despite the recent availability of efficient Hessian approximations. We propose a novel approximate Bayesian inference method that modifies deep ensembles to incorporate Stein Variational Newton updates. Our approach uniquely integrates scalable modern Hessian approximations, achieving faster convergence and more accurate posterior distribution approximations. We validate the effectiveness of our method on diverse regression and classification tasks, demonstrating superior performance with a significantly reduced number of training epochs compared to existing ensemble-based methods, while enhancing uncertainty quantification and robustness against overfitting.

Paper Structure

This paper contains 20 sections, 17 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Conceptual overview of the SVN method. The green curvature-informed SVN updates are much higher quality and require fewer steps than the corresponding blue SVGD ones.
  • Figure 2: Overview of the Hessian approximations used in our SVN algorithm.
  • Figure 3: Synthetic regression example for Ensemble, SVGD, and SVN methods. The training data is marked with black dots, and the true function is represented with a dashed line. The predictive mean of the neural network ensemble is shown in dark blue, with the standard deviations highlighted in light blue. SVN best captures the underlying data distribution.
  • Figure 4: Test negative log-likelihood on UCI regression datasets. We truncated the power plot as a result of WGD's inferior performance. Our proposed SVN method outperforms the ensemble, WGD, and SVGD on all datasets except for naval.
  • Figure 5: Comparison of validation negative log-likelihood computed at the end of every epoch for the first $20$ epochs of training on Yacht, Energy, and Wine datasets. While SVN's initial performance is considerably worse than the other methods, it outperforms both within a few epochs.