Log-Concave Coupling for Sampling Neural Net Posteriors

Curtis McDonald; Andrew R Barron

Log-Concave Coupling for Sampling Neural Net Posteriors

Curtis McDonald, Andrew R Barron

TL;DR

This work addresses the challenge of sampling multimodal Bayesian posteriors for a single-hidden-layer neural network. It introduces an auxiliary variable $\xi$ to couple the posterior so that the reverse conditional $p(w|\xi)$ becomes log-concave, enabling efficient MCMC, and analyzes the marginal $p(\xi)$ under Gaussian and $\ell_1$-ball priors to establish log-concavity under precise conditions. Building on these properties, the authors propose Greedy Bayes, a recursive scheme that constructs posterior means via a sequence of log-concave-conditioned densities, with Langevin-type sampling leveraged for efficiency and potential information-theoretic risk bounds. The framework offers a scalable, theoretically grounded approach to Bayesian neural network posterior sampling, with clear guidance on priors and sampling methods and connections to reverse diffusion techniques for score-based sampling.

Abstract

In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $ξ$. The resulting reverse conditional $w|ξ$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $ξ$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $ξ$ is again strictly log concave for sufficiently large $d$. The score of the marginal density of the auxiliary random variable $ξ$ is determined by an expectation over $w|ξ$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $ξ$ permits methods of sampling $ξ$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $ξ$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|ξ$, produces neuron weights $w$ whose marginal distribution is from the desired posterior.

Log-Concave Coupling for Sampling Neural Net Posteriors

TL;DR

This work addresses the challenge of sampling multimodal Bayesian posteriors for a single-hidden-layer neural network. It introduces an auxiliary variable

to couple the posterior so that the reverse conditional

becomes log-concave, enabling efficient MCMC, and analyzes the marginal

under Gaussian and

-ball priors to establish log-concavity under precise conditions. Building on these properties, the authors propose Greedy Bayes, a recursive scheme that constructs posterior means via a sequence of log-concave-conditioned densities, with Langevin-type sampling leveraged for efficiency and potential information-theoretic risk bounds. The framework offers a scalable, theoretically grounded approach to Bayesian neural network posterior sampling, with clear guidance on priors and sampling methods and connections to reverse diffusion techniques for score-based sampling.

Abstract

of dimension

is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for

with an auxiliary random variable

. The resulting reverse conditional

of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on

with suitably small variance, the resulting marginal density of the auxiliary variable

is proven to be strictly log concave for all dimensions

. For a uniform prior on the unit

ball, evidence is given that the density of

is again strictly log concave for sufficiently large

. The score of the marginal density of the auxiliary random variable

is determined by an expectation over

and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of

permits methods of sampling

by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of

is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from

, produces neuron weights

whose marginal distribution is from the desired posterior.

Paper Structure (11 sections, 4 theorems, 34 equations)

This paper contains 11 sections, 4 theorems, 34 equations.

Introduction
Model Parameters and Auxiliary Random Variable Distribution
The Log Concavity of Densities $p(w|\xi)$ and $p(\xi)$
Reverse Conditional Density $p(w|\xi)$
Marginal Density $p(\xi)$
Gaussian Prior and Data Matrix Eigenvalues
Bounded Data Entries and Uniform Prior over $\ell_{1}$ Ball
Connections with Reverse Diffusion
MCMC Sampling for Log Concave Target Distributions
Greedy Bayes for Neural Networks
Future Work

Key Result

Lemma 1

The conditional covariance matrix of the density $p(w|\xi)$ under the Gaussian prior is dominated by the covariance matrix of the prior, Equivalently, for any direction $v$ the variance of $z = v\cdot w$ is less than $\sigma_{0}^{2}\|v\|^{2}$,

Theorems & Definitions (10)

Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
Conjecture 1
Lemma 4
proof
Remark 1

Log-Concave Coupling for Sampling Neural Net Posteriors

TL;DR

Abstract

Log-Concave Coupling for Sampling Neural Net Posteriors

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (10)