Table of Contents
Fetching ...

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

Abbas Mammadov, So Takao, Bohan Chen, Ricardo Baptista, Morteza Mardani, Yee Whye Teh, Julius Berner

TL;DR

A principled variational objective is developed that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter.

Abstract

Flow maps enable high-quality image generation in a single forward pass. However, unlike iterative diffusion models, their lack of an explicit sampling trajectory impedes incorporating external constraints for conditional generation and solving inverse problems. We put forth Variational Flow Maps, a framework for conditional sampling that shifts the perspective of conditioning from "guiding a sampling path", to that of "learning the proper initial noise". Specifically, given an observation, we seek to learn a noise adapter model that outputs a noise distribution, so that after mapping to the data space via flow map, the samples respect the observation and data prior. To this end, we develop a principled variational objective that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter. Experiments on various inverse problems show that VFMs produce well-calibrated conditional samples in a single (or few) steps. For ImageNet, VFM attains competitive fidelity while accelerating the sampling by orders of magnitude compared to alternative iterative diffusion/flow models. Code is available at https://github.com/abbasmammadov/VFM

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

TL;DR

A principled variational objective is developed that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter.

Abstract

Flow maps enable high-quality image generation in a single forward pass. However, unlike iterative diffusion models, their lack of an explicit sampling trajectory impedes incorporating external constraints for conditional generation and solving inverse problems. We put forth Variational Flow Maps, a framework for conditional sampling that shifts the perspective of conditioning from "guiding a sampling path", to that of "learning the proper initial noise". Specifically, given an observation, we seek to learn a noise adapter model that outputs a noise distribution, so that after mapping to the data space via flow map, the samples respect the observation and data prior. To this end, we develop a principled variational objective that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter. Experiments on various inverse problems show that VFMs produce well-calibrated conditional samples in a single (or few) steps. For ImageNet, VFM attains competitive fidelity while accelerating the sampling by orders of magnitude compared to alternative iterative diffusion/flow models. Code is available at https://github.com/abbasmammadov/VFM
Paper Structure (65 sections, 11 theorems, 85 equations, 23 figures, 2 tables, 3 algorithms)

This paper contains 65 sections, 11 theorems, 85 equations, 23 figures, 2 tables, 3 algorithms.

Key Result

Proposition 3.1

Assume that $p(z) = \mathcal{N}(z | 0, I)$, $p(x) = \mathcal{N}(x | m, C)$ for some $m\in \mathbb{R}^d$ and $C \in \mathbb{R}^{d \times d}$ symmetric positive definite, $f_\theta(z) = K_\theta z + b_\theta$ and $q_\phi(z|y) = \mathcal{N}(z|\mu_\phi(y), \mathtt{diag}(\sigma^2_\phi(y)))$. Then, for an

Figures (23)

  • Figure 1: One-step conditional generation with Variational Flow Maps (VFM). Given an observation $y$, VFM learns a noise adapter network $q_\phi(z|y)$, which approximates the noise space posterior $p(z|y)$ via amortized variational inference. Conditional noise samples $z \sim q_\phi(z | y)$ are then mapped to data space in a single step via a learned flow map $x = f_\theta(z)$, producing conditional samples that approximate $p(x | y)$. In VFM, the networks $q_\phi$ and $f_\theta$ are trained jointly by extending the variational autoencoder framework to learn the correspondence between the triple $(x, y, z)$. By jointly training, $f_\theta$ learns to compensate for the simple Gaussian assumption on $q_\phi$.
  • Figure 2: Prior 2D samples and posterior densities in data space (top row) and noise space (bottom row). We observe the $x$-component (black dashed lines) with $\sigma=0.1$. The unconditional samples are color-coded by checkerboard cell; light grey for off-manifold samples. VFM successfully captures the bimodal nature of the posterior, while the baselines struggle to do so.
  • Figure 3: Qualitative comparison on ImageNet 256$\times$256 box inpainting. Top row: ground truth, measurement, and reconstructions from guidance-based baselines. Bottom row: conditional samples produced by VFM, showing diversity in the inpainted region.
  • Figure 4: Unconditional generation on ImageNet $256\times256$. Left: unconditional samples from VFM-B/2. Right: unconditional FID comparison versus mean-flow baselines. VFM retains competitive performance despite it being trained for posterior sampling.
  • Figure 5: One-step reward-aligned generation using VFM fine-tuning. Starting from a pre-trained ImageNet flow map, VFM efficiently adapts the latent noise space and flow trajectories to sample from a reward-tilted distribution, achieving strong visual alignment with a target reward $R(x, c)$ in a single forward pass while preserving image quality.
  • ...and 18 more figures

Theorems & Definitions (29)

  • Proposition 3.1
  • proof
  • Proposition 3.2
  • proof
  • Remark 3.3
  • Proposition 3.4
  • proof
  • Definition 1.1: Matrix Sets and Measure
  • Lemma 1.2: Optimal Generative Parameters via KL Minimization
  • proof
  • ...and 19 more