Table of Contents
Fetching ...

Sample-efficient evidence estimation of score based priors for model selection

Frederic Wang, Katherine L. Bouman

TL;DR

The proposed estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

Abstract

The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements $y$ to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence $p(y \mid M)$ under different models $M$ that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose \method, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

Sample-efficient evidence estimation of score based priors for model selection

TL;DR

The proposed estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.

Abstract

The choice of prior is central to solving ill-posed imaging inverse problems, making it essential to select one consistent with the measurements to avoid severe bias. In Bayesian inverse problems, this could be achieved by evaluating the model evidence under different models that specify the prior and then selecting the one with the highest value. Diffusion models are the state-of-the-art approach to solving inverse problems with a data-driven prior; however, directly computing the model evidence with respect to a diffusion prior is intractable. Furthermore, most existing model evidence estimators require either many pointwise evaluations of the unnormalized prior density or an accurate clean prior score. We propose \method, an estimator of the model evidence of a diffusion prior by integrating over the time-marginals of posterior sampling methods. Our method leverages the large amount of intermediate samples naturally obtained during the reverse diffusion sampling process to obtain an accurate estimation of the model evidence using only a handful of posterior samples (e.g., 20). We also demonstrate how to implement our estimator in tandem with recent diffusion posterior sampling methods. Empirically, our estimator matches the model evidence when it can be computed analytically, and it is able to both select the correct diffusion model prior and diagnose prior misfit under different highly ill-conditioned, non-linear inverse problems, including a real-world black hole imaging problem.
Paper Structure (26 sections, 11 theorems, 41 equations, 11 figures, 3 tables, 2 algorithms)

This paper contains 26 sections, 11 theorems, 41 equations, 11 figures, 3 tables, 2 algorithms.

Key Result

Proposition 1

Given diffusion process ${\bm{x}}_t = a_t {\bm{x}}_0 + \sigma_t {\bm{z}}_t, {\bm{z}}_t \sim \mathcal{N}(0, {\bm{I}})$, posterior marginals $p({\bm{x}}_t \mid {\bm{y}}) \propto p({\bm{x}}_t) \int p({\bm{y}} \mid {\bm{x}}_0) p({\bm{x}}_0 \mid {\bm{x}}_t) d{\bm{x}}_0$, and timesteps $0=t_0<\dots<t_N=T$ The log-likelihood term can be estimated with posterior samples For likelihoods with Gaussian noise

Figures (11)

  • Figure 1: Visualization of the unbiased estimators $\Theta_{high}, \Theta_{low}$ of the likelihood score $\nabla_{{\bm{x}}_t} \log p({\bm{y}} \mid {\bm{x}}_t)$ described in Lemma \ref{['lemma:score-rep-high-noise']} for both in-distribution (left) and out-of-distribution (right) measurements ${\bm{y}}$. At large diffusion time steps, the high noise estimator (top) uses the distance between $\tilde{{\bm{x}}}_0$ and $\mathbb{E}[{\bm{x}}_0 \mid {\bm{x}}_t]$. At small diffusion time steps, the low noise estimator (bottom) uses the likelihood score at $\tilde{{\bm{x}}}_0$. The estimator is larger (longer arrows) when ${\bm{y}}$ is out-of-distribution, as there greater KL divergence from the posterior to the prior; this results in DiME computing a lower evidence. Note the scaling factors $\frac{a_t}{\sigma_t^2}$ and $\bm{\Sigma}_{{\bm{x}}_0 \mid {\bm{x}}_t}$ of the estimator are not shown.
  • Figure 2: Model evidence confusion matrix for Gaussian phase retrieval (left) and Fourier phase retrieval (right) for each (ground truth measurement, model) pair of MNIST digits. Our method selects the correct model for all cases. Posterior samples shown for dotted matrix entries below. For Gaussian phase retrieval, DiME estimates higher model likelihood for visually similar digits, such as 4 and 9. For Fourier phase retrieval, both translations and reflections are invariances as seen in the posterior samples, so DiME estimates a high likelihood of model 9 given a measurement of a 6.
  • Figure 3: Model evidence estimates on real M87* observations across 5 different priors using exact DAPS (left) and Gaussian approximation DAPS (right). Our method concludes that, of these prior models, GRMHD is the most likely model. The violin plots show the distribution of evidences from 20 different posterior-sample paths from the DAPS sampling process, and the mean is the overall evidence estimate. Gaussian approximation DAPS gives highly accurate evidence estimates with 7x less compute than exact DAPS, but with slightly higher variance.
  • Figure 4: M87* reconstructions using the 5 priors using exact DAPS (left) and Gaussian approximation DAPS (middle). While all posterior samples have a data fit of around reduced $\chi^2 \approx 1$, they have varying path evidence estimates (shown in the top left of each image), which are negatively proportional to the KL divergence between that posterior sample and the prior. Right: Example unconditional samples from each prior.
  • Figure 5: Left: GRMHD model validation results on M87* observations by comparing to evidence of in-distribution measurements ${\bm{y}}$. Our method shows that the evidence of M87* observations have a $z$-score of about -0.81 compared to the evidence distribution of GRMHD measurements, indicating that M87* is statistically in-distribution of GRMHD. The evidence of simulated measurements of out-of-distribution images are also shown, demonstrating that measurements from out-of-distribution (OOD) images have OOD evidence. Right: Mean reconstruction and posterior samples of the OOD images with the highest and lowest distance to the GRMHD prior. The blurred ground truth image, or the maximum resolution that can be obtained from measurements, is also displayed.
  • ...and 6 more figures

Theorems & Definitions (19)

  • Proposition 1: : Model evidence estimation along the standard marginals
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • proof : Proof
  • Proposition 1: : Model evidence estimation along the standard marginals
  • proof
  • Lemma 3
  • proof
  • Corollary 1
  • ...and 9 more