Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models
Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, Nikolay Malkin
TL;DR
This work introduces outsourced diffusion sampling to perform efficient posterior inference in the latent noise spaces of generative models, casting the data-space posterior p(\mathbf{x}|\mathbf{y}) as a posterior over outsourced noise p(\mathbf{z}|\mathbf{y}) through x = f_\theta(\mathbf{z}). By training diffusion samplers with off-policy trajectory balance objectives, the method amortizes posterior sampling across tasks and priors (VAEs, GANs, normalizing flows, CNFs, diffusion models), often yielding smoother posteriors in noise space and enabling rapid, gradient-free conditional sampling. The approach is validated across diverse domains, including CIFAR-10 class-conditional generation, high-resolution FFHQ conditioning, text-to-image RLHF, and protein structure diversity, where it outperforms or matches strong baselines (MCMC, adjoint matching) while offering substantial efficiency gains and scalability. The work demonstrates that diffusion-based amortization in noise space is a versatile, model-agnostic tool for constrained generation and Bayesian inference with large pretrained priors, with potential extensions to discrete problems and general probabilistic programs.
Abstract
Any well-behaved generative model over a variable $\mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $\mathbf{z}$: $\mathbf{x}=f_θ(\mathbf{z})$. In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_θ(\mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(\mathbf{x}\mid\mathbf{y}) \propto p_θ(\mathbf{x})r(\mathbf{x},\mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $\mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($\mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_θ(\mathbf{z})$ are distributed according to the posterior in the data space ($\mathbf{x}$). For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably with other inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.
