Table of Contents
Fetching ...

Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

Tongda Xu, Ziran Zhu, Jian Li, Dailan He, Yuanyuan Wang, Ming Sun, Ling Li, Hongwei Qin, Yan Wang, Jingjing Liu, Ya-Qin Zhang

TL;DR

The paper addresses the challenge of posterior sampling in diffusion inverse solvers (DIS) where posterior-mean based approximations can fall outside the natural image support, undermining neural operators. It proves that the PF-ODE solution $\Phi_0(X_t)$ serves as a valid posterior sample and distills this into a Consistency Model (CM) to enable high-quality posterior sampling for DIS. Building on CM, the authors propose a CM-based DIS framework (and a CM-inversion variant) that yields substantial gains in both constraint-consistency and perceptual quality when the operator $f(\cdot)$ is neural, across segmentation, layout, captioning, and classification tasks. Empirically, CM-based DIS outperforms prior posterior-sample strategies while maintaining reasonable compute costs, and the approach remains applicable to non-neural operators as well. The work advances practical posterior sampling in diffusion-based inverse problems and suggests extensions to larger-scale or latent-diffusion settings.

Abstract

Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution $p_θ(X_0|y)$, with a predefined diffusion model $p_θ(X_0)$, an operator $f(\cdot)$, and a measurement $y=f(x'_0)$ derived from an unknown image $x'_0$. Existing DIS estimate the conditional score function by evaluating $f(\cdot)$ with an approximated posterior sample drawn from $p_θ(X_0|X_t)$. However, most prior approximations rely on the posterior means, which may not lie in the support of the image distribution, thereby potentially diverge from the appearance of genuine images. Such out-of-support samples may significantly degrade the performance of the operator $f(\cdot)$, particularly when it is a neural network. In this paper, we introduces a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution, and also enhances the compatibility with neural network-based operators $f(\cdot)$. We first demonstrate that the solution of the Probability Flow Ordinary Differential Equation (PF-ODE) with an initial value $x_t$ yields an effective posterior sample $p_θ(X_0|X_t=x_t)$. Based on this observation, we adopt the Consistency Model (CM), which is distilled from PF-ODE, for posterior sampling. Furthermore, we design a novel family of DIS using only CM. Through extensive experiments, we show that our proposed method for posterior sample approximation substantially enhance the effectiveness of DIS for neural network operators $f(\cdot)$ (e.g., in semantic segmentation). Additionally, our experiments demonstrate the effectiveness of the new CM-based inversion techniques. The source code is provided in the supplementary material.

Consistency Model is an Effective Posterior Sample Approximation for Diffusion Inverse Solvers

TL;DR

The paper addresses the challenge of posterior sampling in diffusion inverse solvers (DIS) where posterior-mean based approximations can fall outside the natural image support, undermining neural operators. It proves that the PF-ODE solution serves as a valid posterior sample and distills this into a Consistency Model (CM) to enable high-quality posterior sampling for DIS. Building on CM, the authors propose a CM-based DIS framework (and a CM-inversion variant) that yields substantial gains in both constraint-consistency and perceptual quality when the operator is neural, across segmentation, layout, captioning, and classification tasks. Empirically, CM-based DIS outperforms prior posterior-sample strategies while maintaining reasonable compute costs, and the approach remains applicable to non-neural operators as well. The work advances practical posterior sampling in diffusion-based inverse problems and suggests extensions to larger-scale or latent-diffusion settings.

Abstract

Diffusion Inverse Solvers (DIS) are designed to sample from the conditional distribution , with a predefined diffusion model , an operator , and a measurement derived from an unknown image . Existing DIS estimate the conditional score function by evaluating with an approximated posterior sample drawn from . However, most prior approximations rely on the posterior means, which may not lie in the support of the image distribution, thereby potentially diverge from the appearance of genuine images. Such out-of-support samples may significantly degrade the performance of the operator , particularly when it is a neural network. In this paper, we introduces a novel approach for posterior approximation that guarantees to generate valid samples within the support of the image distribution, and also enhances the compatibility with neural network-based operators . We first demonstrate that the solution of the Probability Flow Ordinary Differential Equation (PF-ODE) with an initial value yields an effective posterior sample . Based on this observation, we adopt the Consistency Model (CM), which is distilled from PF-ODE, for posterior sampling. Furthermore, we design a novel family of DIS using only CM. Through extensive experiments, we show that our proposed method for posterior sample approximation substantially enhance the effectiveness of DIS for neural network operators (e.g., in semantic segmentation). Additionally, our experiments demonstrate the effectiveness of the new CM-based inversion techniques. The source code is provided in the supplementary material.
Paper Structure (28 sections, 2 theorems, 21 equations, 11 figures, 9 tables, 8 algorithms)

This paper contains 28 sections, 2 theorems, 21 equations, 11 figures, 9 tables, 8 algorithms.

Key Result

Proposition 3.2

The solution of PF-ODE has a positive likelihood in true posterior with high probability, i.e.,

Figures (11)

  • Figure 1: A visual comparison of DIS with posterior mean as approximation for posterior sample, and DIS with proposed CM approximation for posterior sample.
  • Figure 2: Different approximations of posterior sample, and their output after a segmentation $f(.)$.
  • Figure 3: The PF-ODE's velocity field of a five GMM example.
  • Figure 4: A toy example with five GMM.
  • Figure 5: Visual results on neural network operators such as segmentation, caption and classification.
  • ...and 6 more figures

Theorems & Definitions (4)

  • Proposition 3.2
  • Lemma 3.3
  • proof
  • proof