Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

Jonathan Spence; Tobías I. Liaudat; Konstantinos Zygalakis; Marcelo Pereyra

Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

Jonathan Spence, Tobías I. Liaudat, Konstantinos Zygalakis, Marcelo Pereyra

TL;DR

This work introduces a novel approach to GAN architecture design by applying deep unfolding to Langevin MCMC algorithms, which maps fixed-step iterative algorithms onto modular neural networks, yielding architectures that are both flexible and amenable to interpretation.

Abstract

Markov chain Monte Carlo (MCMC) methods are fundamental to Bayesian computation, but can be computationally intensive, especially in high-dimensional settings. Push-forward generative models, such as generative adversarial networks (GANs), variational auto-encoders and normalising flows offer a computationally efficient alternative for posterior sampling. However, push-forward models are opaque as they lack the modularity of Bayes Theorem, leading to poor generalisation with respect to changes in the likelihood function. In this work, we introduce a novel approach to GAN architecture design by applying deep unfolding to Langevin MCMC algorithms. This paradigm maps fixed-step iterative algorithms onto modular neural networks, yielding architectures that are both flexible and amenable to interpretation. Crucially, our design allows key model parameters to be specified at inference time, offering robustness to changes in the likelihood parameters. We train these unfolded samplers end-to-end using a supervised regularized Wasserstein GAN framework for posterior sampling. Through extensive Bayesian imaging experiments, we demonstrate that our proposed approach achieves high sampling accuracy and excellent computational efficiency, while retaining the physics consistency, adaptability and interpretability of classical MCMC strategies.

Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

TL;DR

Abstract

Paper Structure (38 sections, 36 equations, 10 figures, 5 tables, 1 algorithm)

This paper contains 38 sections, 36 equations, 10 figures, 5 tables, 1 algorithm.

Introduction
Background, Notation and Related Works
Setup and notation
Data-Driven Priors
Optimization Schemes and Deep Unfolding
Markov Chain Monte Carlo
Deep Generative Modelling
Conditional Generative Adversarial Networks
Proposed Methodology
Architecture
Regularised Adversarial Training
Regularised Training with Perceptual Qualities
Numerical Experiments
Image Deblurring on MNIST
Task
...and 23 more sections

Figures (10)

Figure 1: (a): Illustration of unfolded optimization architecture. The parameters $\theta, \vartheta_{0}, \dots, \vartheta_{L}$ are trained end-to-end such that the output samples from layers $\hat{x}_L$ is a close approximation of $\mathbb{E}[x|y]$. (b): Illustration of the proposed unfolded MCMC architecture. Parameters $\theta, \vartheta_{0}, \dots, \vartheta_{L}$ are trained end-to-end such that the output samples from layers $\ell\ge L_0$ closely resemble $p(x|y)$ over a training dataset.
Figure 2: Convergence comparison of U-SGS and conventional SGS on small truncated chains.
Figure 3: Qualitative comparison for motion deblurring applied to the MNIST dataset. For each method, we display trained posterior samples from $p_\theta(x|y)$ for a range of three observations from the test dataset. In addition, the residual error and an estimate of the principal three eigen-directions from a PCA expansion are shown. For each eigenvector, the Pearson correlation to the residual is shown in the upper-left corner.
Figure 4: Qualitative comparison of example reconstructions on ground-truth images from the PROBES dataset. The left column displays the ground truth, the pseudo-inverse $A^\dagger y$, and the sampled mask ${\mathbf{m}}$ applied in the Fourier domain. We compare the posterior mean along, the residual and predicted residual (approximated via the standard dev. of 8 posterior samples) on a $32\times 32$ scale. In the upper-left corner of each predicted residual, we report the Pearson correlation to the true residual.
Figure 5: Qualitative comparison for motion deblurring applied to the MNIST dataset with out-of-distribution kernels ${\mathbf{k}}\sim \mathcal{GP}(19, \lambda_\text{Matern} = 0.5, \sigma_\text{Matern}=0.4)$.
...and 5 more figures

Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

TL;DR

Abstract

Deep unfolding of MCMC kernels: scalable, modular & explainable GANs for high-dimensional posterior sampling

Authors

TL;DR

Abstract

Table of Contents

Figures (10)