Table of Contents
Fetching ...

Calibrated Test-Time Guidance for Bayesian Inference

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, Stephan Mandt

TL;DR

This work shows that common test-time guidance methods do not recover the correct posterior distribution and identifies the structural approximations responsible for this failure, and proposes consistent alternative estimators that enable calibrated sampling from the Bayesian posterior.

Abstract

Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.

Calibrated Test-Time Guidance for Bayesian Inference

TL;DR

This work shows that common test-time guidance methods do not recover the correct posterior distribution and identifies the structural approximations responsible for this failure, and proposes consistent alternative estimators that enable calibrated sampling from the Bayesian posterior.

Abstract

Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.
Paper Structure (38 sections, 6 theorems, 58 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 38 sections, 6 theorems, 58 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Theorem 4.1

Given a prior $p(x)$ that is Lipschitz and a twice differentiable likelihood $p(y \mid x)$: The posterior mean approximation in eq:tweedie-estimator is biased for some $0 < t < 1$ unless the likelihood $p(y \mid x)$ is constant in $x$.

Figures (7)

  • Figure 1: We present a test-time guidance scheme to sample from calibrated Bayesian posteriors.(Left) Our framework accurately samples the correct posterior (blue). Posterior mean (\ref{['eq:tweedie-estimator']}), posterior Gaussian (\ref{['eq:noisy-tweedie-estimator']}), and optimal control approximations (\ref{['eq:optimal-control']}) to the diffused likelihood $p(y \mid x_t)$ yield uncalibrated samples (orange). (Right) Our framework can correctly sample from tempered posteriors $p(x \mid y, \gamma) \propto p(x) p(y \mid x)^\gamma$(blue). Rescaling the noisy gradient by $\gamma$ leads to biased samples, even if the diffused likelihood $p(y \mid x_t)$ is accurately estimated (orange).
  • Figure 2: Approximations of diffused likelihoods.(Left) The posterior mean approximation in \ref{['eq:tweedie-estimator']} looks up the likelihood value at the mean of the diffusion posterior. (Center) Gaussian approximations to the posterior lead to inconsistent estimates that cannot be corrected by sampling more points. (Right) Our method relies on the true diffusion posterior $p(x_t \mid x)$, yielding arbitrary precision to determine the diffused likelihood $p(y \mid x_t)$ and its gradients.
  • Figure 3: Empirical performance on Bayesian Inference tasks. Performance is measured in C2ST (lower is better, $\downarrow$) friedman2004c2st, comparing the distribution of guided samples to those of ground truth samples. Our methods improve performance with more compute, while other test-time adaptation methods are limited due to their approximations to diffused likelihood gradients.
  • Figure 4: Uncurated comparison of the proposed method with other test-time guidance methods on black-hole imaging task proposed by Mizuno_2022. Despite computing no likelihood gradient, CBG is able to reconstruct the ground truth samples well.
  • Figure 5: Variance comparison for gradient-free and gradient-based methods. In some regions, the gradient-based method has lower variance.
  • ...and 2 more figures

Theorems & Definitions (12)

  • Theorem 4.1
  • Theorem 4.2
  • Theorem 4.3
  • proof
  • proof
  • Lemma 1.1
  • proof
  • Lemma 1.2
  • proof
  • proof
  • ...and 2 more