Variational Bayesian Imaging with an Efficient Surrogate Score-based Prior

Berthy T. Feng; Katherine L. Bouman

Variational Bayesian Imaging with an Efficient Surrogate Score-based Prior

Berthy T. Feng, Katherine L. Bouman

TL;DR

This work tackles the challenge of principled Bayesian imaging with score-based priors in ill-posed inverse problems. It replaces the costly exact log-probability $\log p_\theta^{\text{SDE}}(\mathbf{x})$ with a computable evidence lower bound $b_\theta^{\text{SDE}}(\mathbf{x})$, enabling efficient variational inference for high-dimensional images. Empirically, the surrogate yields at least two orders of magnitude in speedup and reduced memory, while delivering posterior estimates and reconstructions that competitive with or better than diffusion-based baselines across accelerated MRI and black-hole VLBI imaging. The approach demonstrates scalable, principled posterior estimation using score-based priors, with broad implications for scientific and medical imaging where uncertainty quantification is essential. Overall, the surrogate enables practical deployment of high-capacity diffusion priors within a Bayesian framework, accelerating development and enabling high-fidelity posterior sampling for large-scale imaging tasks.

Abstract

We propose a surrogate function for efficient yet principled use of score-based priors in Bayesian imaging. We consider ill-posed inverse imaging problems in which one aims for a clean image posterior given incomplete or noisy measurements. Since the measurements do not uniquely determine a true image, a prior is needed to constrain the solution space. Recent work turned score-based diffusion models into principled priors for solving ill-posed imaging problems by appealing to an ODE-based log-probability function. However, evaluating the ODE is computationally inefficient and inhibits posterior estimation of high-dimensional images. Our proposed surrogate prior is based on the evidence lower bound of a score-based diffusion model. We demonstrate the surrogate prior on variational inference for efficient approximate posterior sampling of large images. Compared to the exact prior in previous work, our surrogate accelerates optimization of the variational image distribution by at least two orders of magnitude. We also find that our principled approach gives more accurate posterior estimation than non-variational diffusion-based approaches that involve hyperparameter-tuning at inference. Our work establishes a practical path forward for using score-based diffusion models as general-purpose image priors.

Variational Bayesian Imaging with an Efficient Surrogate Score-based Prior

TL;DR

This work tackles the challenge of principled Bayesian imaging with score-based priors in ill-posed inverse problems. It replaces the costly exact log-probability

with a computable evidence lower bound

, enabling efficient variational inference for high-dimensional images. Empirically, the surrogate yields at least two orders of magnitude in speedup and reduced memory, while delivering posterior estimates and reconstructions that competitive with or better than diffusion-based baselines across accelerated MRI and black-hole VLBI imaging. The approach demonstrates scalable, principled posterior estimation using score-based priors, with broad implications for scientific and medical imaging where uncertainty quantification is essential. Overall, the surrogate enables practical deployment of high-capacity diffusion priors within a Bayesian framework, accelerating development and enabling high-fidelity posterior sampling for large-scale imaging tasks.

Abstract

Paper Structure (65 sections, 18 equations, 7 figures, 2 tables)

This paper contains 65 sections, 18 equations, 7 figures, 2 tables.

Introduction
Related work
Bayesian inverse imaging
Diffusion models for inverse problems
Score-based priors
Background
Score-based diffusion models
Sampling with a reverse-time SDE
Image probabilities
Probability flow ODE
Equivalence of $p_\theta^\text{SDE}$ and $p_\theta^\text{ODE}$
Evidence lower bound
Method
Approximating the posterior with VI
Implementation details
...and 50 more sections

Figures (7)

Figure 1: Computational efficiency of proposed surrogate prior vs. exact prior. For each image size, we estimated a posterior of images conditioned on $4\times$-accelerated MRI measurements of a knee image, using a Gaussian distribution with diagonal covariance as the variational distribution. The hardware was 4x NVIDIA RTX A6000. The surrogate prior allows for variational inference of image sizes that are prohibitively large for the exact prior. For image sizes supported by the exact prior, the surrogate improved total optimization time by over $120\times$ while using less memory and scaling better with image size. "Image-Restoration Quality" verifies that optimization with the surrogate was done fairly, as the PSNR and SSIM of the converged posterior (averaged over 128 samples) are at least as high as with the exact prior.
Figure 2: High-dimensional Bayesian inference with a surrogate score-based prior. Here we show posterior samples for accelerated MRI of $256\times 256$ knee images, approximated via variational inference with a surrogate score-based prior. The first row shows reconstruction from $16\times$-reduced MRI measurements. The second row shows reconstruction given more $\kappa$-space measurements, i.e., $4\times$-reduced MRI. Bayesian imaging at this image resolution is computationally infeasible with the previous ODE-based approach feng2023score. Our proposed surrogate enables efficient yet principled inference with diffusion-model priors, resulting in inferred posteriors where the true image is within three standard deviations of the posterior mean for 96% and 99% of the pixels for $16\times$- and $4\times$-acceleration, respectively.
Figure 3: Estimated posteriors under surrogate vs. exact prior. For each task, the variational distribution is a RealNVP, and the score model is the same between both prior functions. (a) Both prior functions help recover the correct (Gaussian) posterior. The score-based prior was trained on samples from a known Gaussian distribution (originally fit to $16\times 16$ face images), and the measurements are the lowest 6.25% spatial frequencies of a test image from the prior. Since the prior and likelihood are both Gaussian, we know the ground-truth Gaussian posterior. (b) Estimated posteriors for (i) denoising a CelebA image and (ii) denoising a CIFAR-10 image. Std. dev. is averaged across the three color channels. The score-based prior was trained on CelebA in (i) and CIFAR-10 in (ii). Both prior functions result in comparable image quality; visual differences appear mostly in the image background.
Figure 4: $b_\theta^\text{SDE}(\mathbf{x})$ vs. $\log p_\theta^\text{ODE}(\mathbf{x})$ for samples $\mathbf{x}\sim q_\phi$ as optimization of $\phi$ progresses. The task is from Fig. \ref{['fig:exact_vs_surrogate_posteriors']}(i). For each plot, we took 128 samples $\mathbf{x}\sim q_\phi$ and performed 20 estimates each of $\log p_\theta^\text{ODE}(\mathbf{x})$ and $b_\theta^\text{SDE}(\mathbf{x})$ (approximated with $N_t=2048$ for reduced variance). The density map is a KDE plot of all $128\cdot 20=2560$ values; the 128 scatter points represent the mean estimate for each $\mathbf{x}$. The black line indicates perfect agreement between $b_\theta^\text{SDE}(\mathbf{x})$ and $\log p_\theta^\text{ODE}(\mathbf{x})$. We expect all points to lie below this black line for $b_\theta^\text{SDE}$ to be a lower bound. We find that $b_\theta^\text{SDE}(\mathbf{x})\leq \log p_\theta^\text{ODE}(\mathbf{x})$ (up to variance error), but the optimization progresses differently depending on the prior. Gradients under the surrogate push $q_\phi(\mathbf{x})$ along the black line to increase $b_\theta^\text{SDE}(\mathbf{x})$ without exceeding $\log p_\theta^\text{ODE}(\mathbf{x})$. Optimization under the exact prior proceeds more freely, although eventually achieves higher $b_\theta^\text{SDE}(\mathbf{x})$ at convergence. This visualization may help explain differences in the posterior estimated with the surrogate vs. exact prior.
Figure 5: Comparing our VI approach with a surrogate score-based prior to baselines on a bimodal posterior. In this example, the prior is a bimodal mixture-of-Gaussians, and the likelihood is Gaussian, making the posterior a bimodal mixture-of-Gaussians (shown in "True"). Assuming access to the true prior score function, we tested how well each method recovers the true posterior. Diffusion-based methods depend on hand-tuned meas. weights. Even the meas. weight giving the best KL divergence ("oracle") does not rival using our hyperparameter-free VI approach ("DPI + surr."). Note that this "oracle" weight would not be accessible in practice, as it is determined by comparing to the ground-truth posterior. Diffusion-based baselines either (1) incorrectly place equal weight on both posterior modes or (2) miss one of the modes. DPI with either the surrogate or the exact score-based prior recovers the relative weights of both modes. (KL vs. meas. weight) Regardless of hyperparameters, diffusion-based methods do not reach our KL divergence.
...and 2 more figures

Variational Bayesian Imaging with an Efficient Surrogate Score-based Prior

TL;DR

Abstract

Variational Bayesian Imaging with an Efficient Surrogate Score-based Prior

Authors

TL;DR

Abstract

Table of Contents

Figures (7)