Table of Contents
Fetching ...

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

Siddarth Venkatraman, Mohsin Hasan, Minsu Kim, Luca Scimeca, Marcin Sendera, Yoshua Bengio, Glen Berseth, Nikolay Malkin

TL;DR

This work introduces outsourced diffusion sampling to perform efficient posterior inference in the latent noise spaces of generative models, casting the data-space posterior p(\mathbf{x}|\mathbf{y}) as a posterior over outsourced noise p(\mathbf{z}|\mathbf{y}) through x = f_\theta(\mathbf{z}). By training diffusion samplers with off-policy trajectory balance objectives, the method amortizes posterior sampling across tasks and priors (VAEs, GANs, normalizing flows, CNFs, diffusion models), often yielding smoother posteriors in noise space and enabling rapid, gradient-free conditional sampling. The approach is validated across diverse domains, including CIFAR-10 class-conditional generation, high-resolution FFHQ conditioning, text-to-image RLHF, and protein structure diversity, where it outperforms or matches strong baselines (MCMC, adjoint matching) while offering substantial efficiency gains and scalability. The work demonstrates that diffusion-based amortization in noise space is a versatile, model-agnostic tool for constrained generation and Bayesian inference with large pretrained priors, with potential extensions to discrete problems and general probabilistic programs.

Abstract

Any well-behaved generative model over a variable $\mathbf{x}$ can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable $\mathbf{z}$: $\mathbf{x}=f_θ(\mathbf{z})$. In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable $\mathbf{x} \sim p_θ(\mathbf{x})$ is straightforward, but sampling from a posterior distribution of the form $p(\mathbf{x}\mid\mathbf{y}) \propto p_θ(\mathbf{x})r(\mathbf{x},\mathbf{y})$, where $r$ is a constraint function depending on an auxiliary variable $\mathbf{y}$, is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space ($\mathbf{z}$). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples $f_θ(\mathbf{z})$ are distributed according to the posterior in the data space ($\mathbf{x}$). For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably with other inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.

Outsourced diffusion sampling: Efficient posterior inference in latent spaces of generative models

TL;DR

This work introduces outsourced diffusion sampling to perform efficient posterior inference in the latent noise spaces of generative models, casting the data-space posterior p(\mathbf{x}|\mathbf{y}) as a posterior over outsourced noise p(\mathbf{z}|\mathbf{y}) through x = f_\theta(\mathbf{z}). By training diffusion samplers with off-policy trajectory balance objectives, the method amortizes posterior sampling across tasks and priors (VAEs, GANs, normalizing flows, CNFs, diffusion models), often yielding smoother posteriors in noise space and enabling rapid, gradient-free conditional sampling. The approach is validated across diverse domains, including CIFAR-10 class-conditional generation, high-resolution FFHQ conditioning, text-to-image RLHF, and protein structure diversity, where it outperforms or matches strong baselines (MCMC, adjoint matching) while offering substantial efficiency gains and scalability. The work demonstrates that diffusion-based amortization in noise space is a versatile, model-agnostic tool for constrained generation and Bayesian inference with large pretrained priors, with potential extensions to discrete problems and general probabilistic programs.

Abstract

Any well-behaved generative model over a variable can be expressed as a deterministic transformation of an exogenous ('outsourced') Gaussian noise variable : . In such a model (\eg, a VAE, GAN, or continuous-time flow-based model), sampling of the target variable is straightforward, but sampling from a posterior distribution of the form , where is a constraint function depending on an auxiliary variable , is generally intractable. We propose to amortize the cost of sampling from such posterior distributions with diffusion models that sample a distribution in the noise space (). These diffusion samplers are trained by reinforcement learning algorithms to enforce that the transformed samples are distributed according to the posterior in the data space (). For many models and constraints, the posterior in noise space is smoother than in data space, making it more suitable for amortized inference. Our method enables conditional sampling under unconditional GAN, (H)VAE, and flow-based priors, comparing favorably with other inference methods. We demonstrate the proposed outsourced diffusion sampling in several experiments with large pretrained prior models: conditional image generation, reinforcement learning with human feedback, and protein structure generation.

Paper Structure

This paper contains 71 sections, 1 theorem, 12 equations, 44 figures, 14 tables, 1 algorithm.

Key Result

Proposition 2.1

Let $\mathbf{w}$ and $\mathbf{x}$ be Borel-measurable random variables valued in $\mathbb{R}^{d_{\rm latent}}$ and $\mathbb{R}^{d_{\rm data}}$, respectively, with $\mathbf{w}$ marginally standard Gaussian, and let $d_{\rm noise}>d_{\rm latent}$. There exists a random variable $\mathbf{z}$ in $\mathb

Figures (44)

  • Figure 1: Left:Top row: Marginal densities of a CNF that transforms a Gaussian distribution ($t=0$) to a Swiss roll ($t=1$). Middle row: The constraint function -- a mixture of two Gaussians centered an an observation $\mathbf{y}$ ($\bullet$) and its reflection through the origin -- pulled back to $\mathbf{x}_t$. Bottom row: Posterior densities at $\mathbf{x}_t$, proportional to the product of the first two rows. The rightmost column shows samples in the data space. Right: The same objects shown in noise and data space for a GAN that transforms noise ($\mathbf{z}$) to data ($\textbf{x}$). Outsourced diffusion samplers approximate $p(\mathbf{x}_0\mid \mathbf{y})$ or $p(\mathbf{z}\mid \mathbf{y})$, which are smoother than $p(\mathbf{x}\mid\mathbf{y})$ (see \ref{['fig:figure2']}).
  • Figure 2: Marginal densities of a diffusion sampler of the posteriors from the CNF example in \ref{['fig:figure1']} in data space and noise space. The data space posterior (top row) has well-separated modes and is harder to sample from than the outsourced posterior (bottom row).
  • Figure 3: (a) Flow paths of a CNF trained with OT-CFM tong2024improving from a source distribution ($\bullet$) to the '2 moons' distribution ($\color{blue}\bullet$). The source is Gaussian (top row) or a mixture of 8 Gaussians (bottom row). (b) The constraint is constructed such that the posterior is the lower moon. CNF flow paths from the lower moon (target posterior) to the source latents (outsourced posterior). (c) Flow paths from naive application of Adjoint Matching domingoenrich2025adjoint, which is is biased for OT flows and non-Gaussian sources. (d) Flows starting at samples from an outsourced diffusion model, which samples the latent posterior, give target samples close to the ground truth.
  • Figure 4: OT-CFM prior
  • Figure 5: True Posterior
  • ...and 39 more figures

Theorems & Definitions (1)

  • Proposition 2.1: Noise outsourcing lemma for Gaussians