VIKING: Deep variational inference with stochastic projections

Samuel G. Fadel; Hrittik Roy; Nicholas Krämer; Yevgen Zainchkovskyy; Stas Syrota; Alejandro Valverde Mahou; Carl Henrik Ek; Søren Hauberg

VIKING: Deep variational inference with stochastic projections

Samuel G. Fadel, Hrittik Roy, Nicholas Krämer, Yevgen Zainchkovskyy, Stas Syrota, Alejandro Valverde Mahou, Carl Henrik Ek, Søren Hauberg

TL;DR

VIKING proposes a geometry-aware variational posterior for overparametrized neural networks by decomposing parameter uncertainty into kernel-space (data-dependent) and image-space (elsewhere) subspaces aligned with the Fisher–Rao metric. The method defines a two-space Gaussian posterior with a low-rank kernel component and a complementary image component, and optimizes a tractable ELBO using matrix-free kernel projections and a stochastic alternating projection scheme. Across toy, image-classification, scalability, OOD detection, and generative modelling tasks, VIKING achieves competitive or superior predictive performance and calibration compared with baselines, illustrating that incorporating the geometry of reparametrizations yields tangible Bayesian benefits in deep learning. While computationally heavier due to projections, the authors harness matrix-free methods and CG with reorthogonalization to scale to modern architectures, suggesting practical applicability pending further optimization.

Abstract

Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions and uncertainties, the practical reality has been the opposite, with unstable training, poor predictive power, and subpar calibration. Building upon recent work on reparametrizations of neural networks, we propose a simple variational family that considers two independent linear subspaces of the parameter space. These represent functional changes inside and outside the support of training data. This allows us to build a fully-correlated approximate posterior reflecting the overparametrization that tunes easy-to-interpret hyperparameters. We develop scalable numerical routines that maximize the associated evidence lower bound (ELBO) and sample from the approximate posterior. Empirically, we observe state-of-the-art performance across tasks, models, and datasets compared to a wide array of baseline methods. Our results show that approximate Bayesian inference applied to deep neural networks is far from a lost cause when constructing inference mechanisms that reflect the geometry of reparametrizations.

VIKING: Deep variational inference with stochastic projections

TL;DR

Abstract

VIKING: Deep variational inference with stochastic projections

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)