Test-Time Anchoring for Discrete Diffusion Posterior Sampling
Litu Rout, Andreas Lugmayr, Yasamin Jafarian, Srivatsan Varadharajan, Constantine Caramanis, Sanjay Shakkottai, Ira Kemelmacher-Shlizerman
TL;DR
This work introduces Anchored Posterior Sampling (APS), a training-free posterior sampler for masked discrete diffusion models. By combining Quantized Expectation, which creates differentiable likelihood guidance in discrete embedding space, with Anchored Remasking, which adaptively unmasks informative tokens early based on a posterior, APS reuses pretrained denoisers to perform high-quality inverse problems without task-specific retraining. The authors derive training and test-time variational bounds that justify the approach and demonstrate state-of-the-art performance among discrete samplers across linear and nonlinear image restoration tasks, as well as training-free stylization and text-guided editing. The results show APS not only matches or surpasses continuous diffusion baselines in many cases but also offers substantial efficiency advantages at high resolutions, signaling discrete diffusion as a practical alternative for posterior sampling in vision tasks and beyond.
Abstract
We study the problem of posterior sampling using pretrained discrete diffusion foundation models, aiming to recover images from noisy measurements without retraining task-specific models. While diffusion models have achieved remarkable success in generative modeling, most advances rely on continuous Gaussian diffusion. In contrast, discrete diffusion offers a unified framework for jointly modeling categorical data such as text and images. Beyond unification, discrete diffusion provides faster inference, finer control, and principled training-free Bayesian inference, making it particularly well-suited for posterior sampling. However, existing approaches to discrete diffusion posterior sampling face severe challenges: derivative-free guidance yields sparse signals, continuous relaxations limit applicability, and split Gibbs samplers suffer from the curse of dimensionality. To overcome these limitations, we introduce Anchored Posterior Sampling (APS) for masked diffusion foundation models, built on two key innovations -- quantized expectation for gradient-like guidance in discrete embedding space, and anchored remasking for adaptive decoding. Our approach achieves state-of-the-art performance among discrete diffusion samplers across linear and nonlinear inverse problems on the standard benchmarks. We further demonstrate the benefits of our approach in training-free stylization and text-guided editing.
