Bayesian Hypernetworks

David Krueger; Chin-Wei Huang; Riashat Islam; Ryan Turner; Alexandre Lacoste; Aaron Courville

Bayesian Hypernetworks

David Krueger, Chin-Wei Huang, Riashat Islam, Ryan Turner, Alexandre Lacoste, Aaron Courville

TL;DR

The paper introduces Bayesian hypernetworks (BHNs), a flexible approach to Bayesian deep learning that uses invertible hypernetworks to transform simple noise into rich, multimodal posterior samples over primary-network parameters. By employing invertible generative models and a weight-normalization-based parametrization, BHNs enable efficient sampling and tractable entropy estimation within variational inference, scaling to large networks. Empirical results across classification, active learning, anomaly detection, and adversarial robustness show BHNs can match or exceed strong baselines and yield more calibrated uncertainty. The work demonstrates that expressive, correlated posterior modeling improves safety and reliability in practical deep learning tasks.

Abstract

We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork $\h$ is a neural network which learns to transform a simple noise distribution, $p(\vecε) = \N(\vec 0,\mat I)$, to a distribution $q(\pp) := q(h(\vecε))$ over the parameters $\pp$ of another neural network (the "primary network")\@. We train $q$ with variational inference, using an invertible $\h$ to enable efficient estimation of the variational lower bound on the posterior $p(\pp | \D)$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of~$q(\pp)$. In practice, Bayesian hypernets can provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.

Bayesian Hypernetworks

TL;DR

Abstract

We study Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork

is a neural network which learns to transform a simple noise distribution,

, to a distribution

over the parameters

of another neural network (the "primary network")\@. We train

with variational inference, using an invertible

to enable efficient estimation of the variational lower bound on the posterior

via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap iid sampling of~

. In practice, Bayesian hypernets can provide a better defense against adversarial examples than dropout, and also exhibit competitive performance on a suite of tasks which evaluate model uncertainty, including regularization, active learning, and anomaly detection.

Bayesian Hypernetworks

TL;DR

Abstract

Bayesian Hypernetworks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)