Table of Contents
Fetching ...

Test-Time Anchoring for Discrete Diffusion Posterior Sampling

Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, Ira Kemelmacher-Shlizerman

TL;DR

This work introduces Anchored Posterior Sampling (APS), a training-free posterior sampler for masked discrete diffusion models. By combining Quantized Expectation, which creates differentiable likelihood guidance in discrete embedding space, with Anchored Remasking, which adaptively unmasks informative tokens early based on a posterior, APS reuses pretrained denoisers to perform high-quality inverse problems without task-specific retraining. The authors derive training and test-time variational bounds that justify the approach and demonstrate state-of-the-art performance among discrete samplers across linear and nonlinear image restoration tasks, as well as training-free stylization and text-guided editing. The results show APS not only matches or surpasses continuous diffusion baselines in many cases but also offers substantial efficiency advantages at high resolutions, signaling discrete diffusion as a practical alternative for posterior sampling in vision tasks and beyond.

Abstract

We study the problem of posterior sampling using pretrained discrete diffusion foundation models, aiming to recover images from noisy measurements without retraining task-specific models. While diffusion models have achieved remarkable success in generative modeling, most advances rely on continuous Gaussian diffusion. In contrast, discrete diffusion offers a unified framework for jointly modeling categorical data such as text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free Bayesian inference, making it particularly well-suited for posterior sampling. However, existing approaches to discrete diffusion posterior sampling face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS) for masked diffusion foundation models, built on two key innovations -- quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. Our approach achieves state-of-the-art performance among discrete diffusion samplers across linear and nonlinear inverse problems on the standard benchmarks. We further demonstrate the benefits of our approach in training-free stylization and text-guided editing.

Test-Time Anchoring for Discrete Diffusion Posterior Sampling

TL;DR

This work introduces Anchored Posterior Sampling (APS), a training-free posterior sampler for masked discrete diffusion models. By combining Quantized Expectation, which creates differentiable likelihood guidance in discrete embedding space, with Anchored Remasking, which adaptively unmasks informative tokens early based on a posterior, APS reuses pretrained denoisers to perform high-quality inverse problems without task-specific retraining. The authors derive training and test-time variational bounds that justify the approach and demonstrate state-of-the-art performance among discrete samplers across linear and nonlinear image restoration tasks, as well as training-free stylization and text-guided editing. The results show APS not only matches or surpasses continuous diffusion baselines in many cases but also offers substantial efficiency advantages at high resolutions, signaling discrete diffusion as a practical alternative for posterior sampling in vision tasks and beyond.

Abstract

We study the problem of posterior sampling using pretrained discrete diffusion foundation models, aiming to recover images from noisy measurements without retraining task-specific models. While diffusion models have achieved remarkable success in generative modeling, most advances rely on continuous Gaussian diffusion. In contrast, discrete diffusion offers a unified framework for jointly modeling categorical data such as text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free Bayesian inference, making it particularly well-suited for posterior sampling. However, existing approaches to discrete diffusion posterior sampling face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS) for masked diffusion foundation models, built on two key innovations -- quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. Our approach achieves state-of-the-art performance among discrete diffusion samplers across linear and nonlinear inverse problems on the standard benchmarks. We further demonstrate the benefits of our approach in training-free stylization and text-guided editing.

Paper Structure

This paper contains 35 sections, 4 theorems, 62 equations, 15 figures, 7 tables, 1 algorithm.

Key Result

Theorem 3.1

Given a sample ${\mathbf{x}}\sim q$, let $q(Z_{0:1}|{\mathbf{x}})$ denote the forward noising law of (eq-fwd). Then, for any measurement ${\mathbf{y}} \sim q(\cdot|{\mathbf{x}})$, the negative log-posterior is bounded by $-\log p_\varphi({\mathbf{x}}|{\mathbf{y}}) \le {\mathcal{L}}_{\mathrm{DDPS}}({

Figures (15)

  • Figure 1: We introduce Anchored Posterior Sampling (APS) for masked diffusion foundation models, built on two key innovations: (i) quantized expectation, which provides gradient-like guidance in discrete embedding space, and (ii) anchored remasking, which enables adaptive decoding. Our method supports a variety of linear and nonlinear image restoration tasks (left three columns), as well as mask-based garment styling and reference-guided style transfer (last column).
  • Figure 2: Qualitative results on FFHQ and ImageNet for SR ($4\times$) and Gaussian deblur. Compared to DPS and G2D2, APS yields better results with sharper texture and refined facial features. For instance, in the third row, APS reconstructs fine strands of the white and brown dog’s fur.
  • Figure 3: Qualitative results on FFHQ for linear (top row) and nonlinear (bottom row) inverse problems. APS and APS-L recover high-fidelity images from severely degraded inputs.
  • Figure 3: Quantitative results on stylization.
  • Figure 4: Qualitative results on stylization. We present four style--content combinations. For each case, our APS algorithm conditions on a single reference style image together with a text prompt to generate the stylized output images. The prompt follows the template: "Generate an image in [style] style. A [class], high detail, photorealistic." Here, [style] denotes the reference style (e.g., Celestial Artwork), and [class] corresponds to the label shown below (e.g., Carousel).
  • ...and 10 more figures

Theorems & Definitions (6)

  • Theorem 3.1: Discrete Diffusion Posterior Sampling (DDPS)
  • Theorem 3.2: Test-time Anchored Posterior Sampling (APS)
  • Theorem A.1: Discrete Diffusion Posterior Sampling(DDPS)
  • proof
  • Theorem A.2: Test-time Anchored Posterior Sampling
  • proof