Table of Contents
Fetching ...

VIKING: Deep variational inference with stochastic projections

Samuel G. Fadel, Hrittik Roy, Nicholas Krämer, Yevgen Zainchkovskyy, Stas Syrota, Alejandro Valverde Mahou, Carl Henrik Ek, Søren Hauberg

TL;DR

VIKING proposes a geometry-aware variational posterior for overparametrized neural networks by decomposing parameter uncertainty into kernel-space (data-dependent) and image-space (elsewhere) subspaces aligned with the Fisher–Rao metric. The method defines a two-space Gaussian posterior with a low-rank kernel component and a complementary image component, and optimizes a tractable ELBO using matrix-free kernel projections and a stochastic alternating projection scheme. Across toy, image-classification, scalability, OOD detection, and generative modelling tasks, VIKING achieves competitive or superior predictive performance and calibration compared with baselines, illustrating that incorporating the geometry of reparametrizations yields tangible Bayesian benefits in deep learning. While computationally heavier due to projections, the authors harness matrix-free methods and CG with reorthogonalization to scale to modern architectures, suggesting practical applicability pending further optimization.

Abstract

Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.

VIKING: Deep variational inference with stochastic projections

TL;DR

VIKING proposes a geometry-aware variational posterior for overparametrized neural networks by decomposing parameter uncertainty into kernel-space (data-dependent) and image-space (elsewhere) subspaces aligned with the Fisher–Rao metric. The method defines a two-space Gaussian posterior with a low-rank kernel component and a complementary image component, and optimizes a tractable ELBO using matrix-free kernel projections and a stochastic alternating projection scheme. Across toy, image-classification, scalability, OOD detection, and generative modelling tasks, VIKING achieves competitive or superior predictive performance and calibration compared with baselines, illustrating that incorporating the geometry of reparametrizations yields tangible Bayesian benefits in deep learning. While computationally heavier due to projections, the authors harness matrix-free methods and CG with reorthogonalization to scale to modern architectures, suggesting practical applicability pending further optimization.

Abstract

Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.

Paper Structure

This paper contains 44 sections, 20 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Panels a and b show isotropic Gaussian samples that have been projected onto the kernel and image of the empirical Fisher--Rao metric associated with a neural network trained on the shown data. The kernel samples retain the predictions of the neural network, while the image samples do not. Panel c shows the learned weight-variances of IVON trained on CIFAR-10. These variances are near-identical across weights, suggesting that an isotropic approximate posterior has been learned.
  • Figure 2: Performance of $\gamma$ values.
  • Figure 3: Post-hoc tuning $\sigma_{\ker}, \sigma_{\mathrm{im}}$.
  • Figure 4: Warmup by pretraining.
  • Figure 5: A toy regression example on a sinusoid curve with 10 data points. Top: The curves show the training points in black, with the mean fit as a red line and 100 posterior predictive samples as blue lines. Bottom: The standard deviation of the predictions over each point in the horizontal axis.
  • ...and 2 more figures