Calibrated Test-Time Guidance for Bayesian Inference

Daniel Geyfman; Felix Draxler; Jan Groeneveld; Hyunsoo Lee; Theofanis Karaletsos; Stephan Mandt

Calibrated Test-Time Guidance for Bayesian Inference

Daniel Geyfman, Felix Draxler, Jan Groeneveld, Hyunsoo Lee, Theofanis Karaletsos, Stephan Mandt

TL;DR

This work shows that common test-time guidance methods do not recover the correct posterior distribution and identifies the structural approximations responsible for this failure, and proposes consistent alternative estimators that enable calibrated sampling from the Bayesian posterior.

Abstract

Test-time guidance is a widely used mechanism for steering pretrained diffusion models toward outcomes specified by a reward function. Existing approaches, however, focus on maximizing reward rather than sampling from the true Bayesian posterior, leading to miscalibrated inference. In this work, we show that common test-time guidance methods do not recover the correct posterior distribution and identify the structural approximations responsible for this failure. We then propose consistent alternative estimators that enable calibrated sampling from the Bayesian posterior. We significantly outperform previous methods on a set of Bayesian inference tasks, and match state-of-the-art in black hole image reconstruction.

Calibrated Test-Time Guidance for Bayesian Inference

TL;DR

Abstract

Paper Structure (38 sections, 6 theorems, 58 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 38 sections, 6 theorems, 58 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Background
Diffusion Models
Bayesian inference
Diffused likelihood approximation
Related Work
Shortcomings of Existing Estimators
Inconsistency of Diffused Likelihood Estimators
Bias of Guidance Scales
Calibrated Bayesian Guidance (CBG)
Differentiable Rewards
Non-differentiable Rewards
Comparing gradient-free and gradient-based methods
Experiments
Bayesian Inference Benchmark
...and 23 more sections

Key Result

Theorem 4.1

Given a prior $p(x)$ that is Lipschitz and a twice differentiable likelihood $p(y \mid x)$: The posterior mean approximation in eq:tweedie-estimator is biased for some $0 < t < 1$ unless the likelihood $p(y \mid x)$ is constant in $x$.

Figures (7)

Figure 1: We present a test-time guidance scheme to sample from calibrated Bayesian posteriors.(Left) Our framework accurately samples the correct posterior (blue). Posterior mean (\ref{['eq:tweedie-estimator']}), posterior Gaussian (\ref{['eq:noisy-tweedie-estimator']}), and optimal control approximations (\ref{['eq:optimal-control']}) to the diffused likelihood $p(y \mid x_t)$ yield uncalibrated samples (orange). (Right) Our framework can correctly sample from tempered posteriors $p(x \mid y, \gamma) \propto p(x) p(y \mid x)^\gamma$(blue). Rescaling the noisy gradient by $\gamma$ leads to biased samples, even if the diffused likelihood $p(y \mid x_t)$ is accurately estimated (orange).
Figure 2: Approximations of diffused likelihoods.(Left) The posterior mean approximation in \ref{['eq:tweedie-estimator']} looks up the likelihood value at the mean of the diffusion posterior. (Center) Gaussian approximations to the posterior lead to inconsistent estimates that cannot be corrected by sampling more points. (Right) Our method relies on the true diffusion posterior $p(x_t \mid x)$, yielding arbitrary precision to determine the diffused likelihood $p(y \mid x_t)$ and its gradients.
Figure 3: Empirical performance on Bayesian Inference tasks. Performance is measured in C2ST (lower is better, $\downarrow$) friedman2004c2st, comparing the distribution of guided samples to those of ground truth samples. Our methods improve performance with more compute, while other test-time adaptation methods are limited due to their approximations to diffused likelihood gradients.
Figure 4: Uncurated comparison of the proposed method with other test-time guidance methods on black-hole imaging task proposed by Mizuno_2022. Despite computing no likelihood gradient, CBG is able to reconstruct the ground truth samples well.
Figure 5: Variance comparison for gradient-free and gradient-based methods. In some regions, the gradient-based method has lower variance.
...and 2 more figures

Theorems & Definitions (12)

Theorem 4.1
Theorem 4.2
Theorem 4.3
proof
proof
Lemma 1.1
proof
Lemma 1.2
proof
proof
...and 2 more

Calibrated Test-Time Guidance for Bayesian Inference

TL;DR

Abstract

Calibrated Test-Time Guidance for Bayesian Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (12)