Table of Contents
Fetching ...

Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

Melrose Roderick, Felix Berkenkamp, Fatemeh Sheikholeslami, Zico Kolter

TL;DR

Generative Posterior Networks (GPNs) address epistemic uncertainty under distributional shift by learning a generative model of the posterior over outputs. Building on Randomized MAP sampling (RMS), GPNs train a network to map anchor samples from a prior over outputs to MAP estimates, effectively sampling from the posterior without retraining multiple networks. The authors prove consistency under Gaussian approximations and demonstrate improved OOD detection, tighter uncertainty quantification, and superior scalability compared to ensembles and kernel-based methods across regression and classification tasks. Leveraging unlabeled data, GPNs offer a practical, efficient Bayesian-like posterior estimator with broad applicability to safety-critical systems. The approach provides a principled framework for uncertainty-aware decisions in high-dimensional settings where labeled data are scarce.

Abstract

In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.

Generative Posterior Networks for Approximately Bayesian Epistemic Uncertainty Estimation

TL;DR

Generative Posterior Networks (GPNs) address epistemic uncertainty under distributional shift by learning a generative model of the posterior over outputs. Building on Randomized MAP sampling (RMS), GPNs train a network to map anchor samples from a prior over outputs to MAP estimates, effectively sampling from the posterior without retraining multiple networks. The authors prove consistency under Gaussian approximations and demonstrate improved OOD detection, tighter uncertainty quantification, and superior scalability compared to ensembles and kernel-based methods across regression and classification tasks. Leveraging unlabeled data, GPNs offer a practical, efficient Bayesian-like posterior estimator with broad applicability to safety-critical systems. The approach provides a principled framework for uncertainty-aware decisions in high-dimensional settings where labeled data are scarce.

Abstract

In many real-world problems, there is a limited set of training data, but an abundance of unlabeled data. We propose a new method, Generative Posterior Networks (GPNs), that uses unlabeled data to estimate epistemic uncertainty in high-dimensional problems. A GPN is a generative model that, given a prior distribution over functions, approximates the posterior distribution directly by regularizing the network towards samples from the prior. We prove theoretically that our method indeed approximates the Bayesian posterior and show empirically that it improves epistemic uncertainty estimation and scalability over competing methods.
Paper Structure (28 sections, 1 theorem, 29 equations, 5 figures, 5 tables)

This paper contains 28 sections, 1 theorem, 29 equations, 5 figures, 5 tables.

Key Result

Theorem 4.1

Let $f$ be some neural network parameterized by ${\bm{\theta}} \sim \mathcal{N}({\bm{\mu}}_{\text{prior}}, {\mathbf{\Sigma}}_{\text{prior}})$ such that, for any inputs $\mathbf{x}$, the outputs $\mathbf{y} = f(\mathbf{x}; {\bm{\theta}})$ are jointly Gaussian. Let ${\mathbf{x}_{\text{sample}}}$ be so

Figures (5)

  • Figure 1: Samples from a GPN using a 2D embedding trained on a simple sine-function. On the top are samples from the embedding with corresponding posterior samples underneath. Black 'x's represent observed data points.
  • Figure 2: Predicted posterior distributions of different methods using the same observed data.
  • Figure 3: OOD detection AUC vs. training time on the Superconductor and CIFAR-10 datasets for parameter and output-regularized ensembles and our method (GPN).
  • Figure 4: Boxplots of 100 posterior samples from every method for 2 test images, one from the In Distribution dataset and one from the OOD dataset.
  • Figure 5: ROC curves for out of distribution prediction based on sample variance from 100 samples.

Theorems & Definitions (2)

  • Theorem 4.1
  • proof