Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

Abbas Mammadov; So Takao; Bohan Chen; Ricardo Baptista; Morteza Mardani; Yee Whye Teh; Julius Berner

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

Abbas Mammadov, So Takao, Bohan Chen, Ricardo Baptista, Morteza Mardani, Yee Whye Teh, Julius Berner

TL;DR

A principled variational objective is developed that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter.

Abstract

Flow maps enable high-quality image generation in a single forward pass. However, unlike iterative diffusion models, their lack of an explicit sampling trajectory impedes incorporating external constraints for conditional generation and solving inverse problems. We put forth Variational Flow Maps, a framework for conditional sampling that shifts the perspective of conditioning from "guiding a sampling path", to that of "learning the proper initial noise". Specifically, given an observation, we seek to learn a noise adapter model that outputs a noise distribution, so that after mapping to the data space via flow map, the samples respect the observation and data prior. To this end, we develop a principled variational objective that jointly trains the noise adapter and the flow map, improving noise-data alignment, such that sampling from complex data posterior is achieved with a simple adapter. Experiments on various inverse problems show that VFMs produce well-calibrated conditional samples in a single (or few) steps. For ImageNet, VFM attains competitive fidelity while accelerating the sampling by orders of magnitude compared to alternative iterative diffusion/flow models. Code is available at https://github.com/abbasmammadov/VFM

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

TL;DR

Abstract

Paper Structure (65 sections, 11 theorems, 85 equations, 23 figures, 2 tables, 3 algorithms)

This paper contains 65 sections, 11 theorems, 85 equations, 23 figures, 2 tables, 3 algorithms.

Introduction
Background
Flow-based Generative Models and Flow Maps
Inverse Problems
Variational Inference and Data Amortization
Variational Flow Maps (VFMs)
Joint Training of the Flow Map and Noise Adapter
Connection to mean flows.
Amortizing Over Multiple Inverse Problems
Single and Multi-Step Conditional Sampling
Other Training Considerations
Mixing in the unconditional loss:
Adaptive loss:
Experiments
Illustration on a 2D Example
...and 50 more sections

Key Result

Proposition 3.1

Assume that $p(z) = \mathcal{N}(z | 0, I)$, $p(x) = \mathcal{N}(x | m, C)$ for some $m\in \mathbb{R}^d$ and $C \in \mathbb{R}^{d \times d}$ symmetric positive definite, $f_\theta(z) = K_\theta z + b_\theta$ and $q_\phi(z|y) = \mathcal{N}(z|\mu_\phi(y), \mathtt{diag}(\sigma^2_\phi(y)))$. Then, for an

Figures (23)

Figure 1: One-step conditional generation with Variational Flow Maps (VFM). Given an observation $y$, VFM learns a noise adapter network $q_\phi(z|y)$, which approximates the noise space posterior $p(z|y)$ via amortized variational inference. Conditional noise samples $z \sim q_\phi(z | y)$ are then mapped to data space in a single step via a learned flow map $x = f_\theta(z)$, producing conditional samples that approximate $p(x | y)$. In VFM, the networks $q_\phi$ and $f_\theta$ are trained jointly by extending the variational autoencoder framework to learn the correspondence between the triple $(x, y, z)$. By jointly training, $f_\theta$ learns to compensate for the simple Gaussian assumption on $q_\phi$.
Figure 2: Prior 2D samples and posterior densities in data space (top row) and noise space (bottom row). We observe the $x$-component (black dashed lines) with $\sigma=0.1$. The unconditional samples are color-coded by checkerboard cell; light grey for off-manifold samples. VFM successfully captures the bimodal nature of the posterior, while the baselines struggle to do so.
Figure 3: Qualitative comparison on ImageNet 256$\times$256 box inpainting. Top row: ground truth, measurement, and reconstructions from guidance-based baselines. Bottom row: conditional samples produced by VFM, showing diversity in the inpainted region.
Figure 4: Unconditional generation on ImageNet $256\times256$. Left: unconditional samples from VFM-B/2. Right: unconditional FID comparison versus mean-flow baselines. VFM retains competitive performance despite it being trained for posterior sampling.
Figure 5: One-step reward-aligned generation using VFM fine-tuning. Starting from a pre-trained ImageNet flow map, VFM efficiently adapts the latent noise space and flow trajectories to sample from a reward-tilted distribution, achieving strong visual alignment with a target reward $R(x, c)$ in a single forward pass while preserving image quality.
...and 18 more figures

Theorems & Definitions (29)

Proposition 3.1
proof
Proposition 3.2
proof
Remark 3.3
Proposition 3.4
proof
Definition 1.1: Matrix Sets and Measure
Lemma 1.2: Optimal Generative Parameters via KL Minimization
proof
...and 19 more

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

TL;DR

Abstract

Variational Flow Maps: Make Some Noise for One-Step Conditional Generation

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (29)