Thermodynamic Bayesian Inference

Maxwell Aifer; Samuel Duffield; Kaelan Donatella; Denis Melanson; Phoebe Klett; Zach Belateche; Gavin Crooks; Antonio J. Martinez; Patrick J. Coles

Thermodynamic Bayesian Inference

Maxwell Aifer, Samuel Duffield, Kaelan Donatella, Denis Melanson, Phoebe Klett, Zach Belateche, Gavin Crooks, Antonio J. Martinez, Patrick J. Coles

TL;DR

Electronic analog devices that sample from Bayesian posteriors by realizing Langevin dynamics physically are proposed and it is shown, under reasonable assumptions, that the Bayesian posteriors for these models can be sampled in time scaling with ln(d), where d is dimension.

Abstract

A fully Bayesian treatment of complicated predictive models (such as deep neural networks) would enable rigorous uncertainty quantification and the automation of higher-level tasks including model selection. However, the intractability of sampling Bayesian posteriors over many parameters inhibits the use of Bayesian methods where they are most needed. Thermodynamic computing has emerged as a paradigm for accelerating operations used in machine learning, such as matrix inversion, and is based on the mapping of Langevin equations to the dynamics of noisy physical systems. Hence, it is natural to consider the implementation of Langevin sampling algorithms on thermodynamic devices. In this work we propose electronic analog devices that sample from Bayesian posteriors by realizing Langevin dynamics physically. Circuit designs are given for sampling the posterior of a Gaussian-Gaussian model and for Bayesian logistic regression, and are validated by simulations. It is shown, under reasonable assumptions, that the Bayesian posteriors for these models can be sampled in time scaling with $\ln(d)$, where $d$ is dimension. For the Gaussian-Gaussian model, the energy cost is shown to scale with $ d \ln(d)$. These results highlight the potential for fast, energy-efficient Bayesian inference using thermodynamic computing.

Thermodynamic Bayesian Inference

TL;DR

Abstract

, where

is dimension. For the Gaussian-Gaussian model, the energy cost is shown to scale with

. These results highlight the potential for fast, energy-efficient Bayesian inference using thermodynamic computing.

Paper Structure (28 sections, 120 equations, 8 figures)

This paper contains 28 sections, 120 equations, 8 figures.

Introduction
Thermodynamic Bayesian Inference
Gaussian-Gaussian model
Bayesian linear regression and Kalman filtering
Bayesian logistic regression
Complexity
Time Complexity
Gaussian-Gaussian model
Logistic Regression
Energy Complexity
Experiments
Gaussian-Gaussian model
Bayesian linear regression
Bayesian logistic regression
Conclusion
...and 13 more sections

Figures (8)

Figure 1: Overview of Thermodynamic Bayesian Inference. (a) Time-evolution of the probability density $p_t(\theta)$ under the Fokker-Planck equation for a time-dependent potential energy $U_t(\theta)$. Initially the potential corresponds to the prior $\beta U_0(\theta) = -\ln p_\theta(\theta)$, and a logistic likelihood term is gradually introduced via a quadratic ramp-up until $t=1$, with $U_1(\theta) = -\ln p_\theta(\theta) - \ln p_{y|\theta}(y|\theta)$, after which the potential does not change. The probability density is initially a Gaussian prior (blue) and approaches the Bayesian posterior (black) over time. This is the posterior for logistic regression with a single data point. (b) Thermodynamic quantities during Fokker-Planck evolution. The free energy $\beta F = \beta \braket{U} - S$ (solid black) is computed at all times as well as $F_\text{eq}$ (dotted black), the equilibrium free energy associated with potential $U_t$. Their difference $F - F_\text{eq}$ can be seen as the degree to which the system is out of equilibrium, and equals the KL divergence to the equilibrium distribution. The latter quantity approaching zero at the end of the protocol signifies that $p_t(\theta)$ approaches the true posterior. The convention for the signs of work and heat is $dE = dW - dQ$. The change in free energy over time (dotted black) lower bounds the work (orange), and the gap between the two is the dissipated work. Heat continues to flow after time $t=1$ (the end of the control protocol), while work is done only for $t<1$. (c) Thermodynamic interpretation of Bayesian inference protocol. The parameter $\theta$ and the data points $y_1 \dots y_N$ are physical degrees of freedom, where $y_1\dots y_N$ are fixed and $\theta$ is free to vary. Initially $\theta$ is decoupled from $y_1\dots y_N$, but a coupling is turned on which requires work to be done by a work reservoir. The system exchanges heat with a heat reservoir, coming to thermal equilibrium in the posterior distribution $p_{\theta|y}$.
Figure 2: Circuit schematic for the Gaussian-Gaussian model posterior sampling device. Two resistor networks are coupled via inductors, with the currents through the inductors given by \ref{['eq:gaussian-circuit-sde']}.
Figure 3: SPICE simulations of proposed Gaussian-Gaussian circuit. The gray points represent the simulated circuit's sampled inductor currents (normalized). The dashed black and solid blue ellipses represent the empirical sample covariance matrix and the target posterior covariance matrix from a Gaussian Bayesian update, respectively. The red and green ellipses represent the prior and likelihood covariance matrices, respectively.
Figure 4: Convergence in Wasserstein distance for simulations of Thermodynamic Bayesian Inference. Here, we consider the Wasserstein distance between simulated thermodynamic samples and the true Gaussian posterior as a function of the number of samples (sampling time). All results are simulated exactly with thermoxduffield2024thermox and averaged over 50 random seeds with one standard deviation shown. Panel (a): Gaussian-Gaussian model with zero prior mean and covariances sampled from a Wishart distribution. Panel (b): Bayesian linear regression with randomly sampled data and design matrix.
Figure 5: Bayesian logistic regression on a two-moons dataset. Panel (a): Contour plot of the probability for a point in the ($x_1, x_0$) plane to belong to class 1 (blue points). The dataset is also shown, where class 0 (blue points) and class 2 (orange points) are arranged in two intersecting moons. This corresponds to a logistic regression problem with $d=2$ features. Panel (b): Kernel Stein discrepancy (KSD) of samples as a function of time with an ideal thermodynamic sampler for varying dimensions. Inset: extracted scaling of the crossing time as a function of dimension for a chosen crossing value of the KSD, similarly to other experiments. The color of each point matches the color of vertical dashed lines in the main figure which indicates the crossing time for each considered dimension. The KSD is averaged over five different runs, and for each time $1000$ samples are collected.
...and 3 more figures

Thermodynamic Bayesian Inference

TL;DR

Abstract

Thermodynamic Bayesian Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (8)