Table of Contents
Fetching ...

Test-time scaling of diffusions with flow maps

Amirmojtaba Sabour, Michael S. Albergo, Carles Domingo-Enrich, Nicholas M. Boffi, Sanja Fidler, Karsten Kreis, Eric Vanden-Eijnden

TL;DR

The paper tackles the challenge of aligning diffusion-based generators with complex, user-defined rewards at test time. It introduces Flow Map Trajectory Tilting (FMTT), which uses flow-map look-ahead to tilt the sampling dynamics toward high-reward regions, with simple, unbiased importance weights that enable either exact tilted sampling or reward-driven search. The approach provides a principled, thermodynamic-length-guided mechanism to optimize the reward-tilted distribution and demonstrates superior test-time efficiency over denoiser-lookahead and gradient-based methods across MNIST and text-to-image tasks, including VLM-based rewards. This framework broadens practical reward specification, enabling sophisticated edits and content control while maintaining sampling fidelity and diversity.

Abstract

A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

Test-time scaling of diffusions with flow maps

TL;DR

The paper tackles the challenge of aligning diffusion-based generators with complex, user-defined rewards at test time. It introduces Flow Map Trajectory Tilting (FMTT), which uses flow-map look-ahead to tilt the sampling dynamics toward high-reward regions, with simple, unbiased importance weights that enable either exact tilted sampling or reward-driven search. The approach provides a principled, thermodynamic-length-guided mechanism to optimize the reward-tilted distribution and demonstrates superior test-time efficiency over denoiser-lookahead and gradient-based methods across MNIST and text-to-image tasks, including VLM-based rewards. This framework broadens practical reward specification, enabling sophisticated edits and content control while maintaining sampling fidelity and diversity.

Abstract

A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diffusion itself. This procedure is often ill posed, as user-specified rewards are usually only well defined on the data distribution at the end of generation. While common workarounds to this problem are to use a denoiser to estimate what a sample would have been at the end of generation, we propose a simple solution to this problem by working directly with a flow map. By exploiting a relationship between the flow map and velocity field governing the instantaneous transport, we construct an algorithm, Flow Map Trajectory Tilting (FMTT), which provably performs better ascent on the reward than standard test-time methods involving the gradient of the reward. The approach can be used to either perform exact sampling via importance weighting or principled search that identifies local maximizers of the reward-tilted distribution. We demonstrate the efficacy of our approach against other look-ahead techniques, and show how the flow map enables engagement with complicated reward functions that make possible new forms of image editing, e.g. by interfacing with vision language models.

Paper Structure

This paper contains 41 sections, 10 theorems, 103 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Proposition 2.0

Assume that $r_0=0$ so that $\rho^r_0 = \rho_0$. Let $\tilde{x}_t$ solve the SDE eq:sde:r with $\tilde{x}_0 \sim \rho^r_0$ and define Then for all $t\in[0,1]$ and any test function $h:\mathbb R^d\to\mathbb R$, we have where the expectations at the right-hand side are taken over the law of $\tilde{x}= (\tilde{x}_t)_{t\in[0,T]}$.

Figures (12)

  • Figure 1: Test-time search can overcome model biases and reliably sample from regions of the distribution (e.g., precise clock times) that baselines fail to capture.
  • Figure 2: Comparison between different look-ahead methods. We visualize corrupted data for different levels of noise $t$ and show the outputs of a 1-step denoiser, 1-step flow map, and a 4-step flow map.
  • Figure 3: Schematic overview of test-time adaptation of diffusions with flow map tilting. Using the look-ahead map $X_{t,1}(x_t)$ in the diffusion inside the reward, reward information can be principly used through the tilted trajectories (green lines). This allows us to perform better ascent on the reward, and the importance weights $A_t$ take on a remarkably simple form that can be used for both exactly sampling $\hat{\rho}_t$ and search for maximizers of $\hat{\rho}_t$.
  • Figure 4: Qualitative results using VLM-based rewards. Prompts where the base model fails to generate aligned outputs are corrected by FMTT, with flow map look-ahead producing the most reliable improvements.
  • Figure 5: Qualitative comparison on three basic geometric rewards (symmetry, anti-symmetry, rotation invariance). The gradient-based methods that change the generative dynamics produce sharper images that satisfy the constraints more reliably than prior methods.
  • ...and 7 more figures

Theorems & Definitions (17)

  • Proposition 2.0: Jarzynski's estimator
  • Proposition 2.1: Unbiased Flow Map Trajectory Tilting
  • Proposition 2.2: Total discrepancy and thermodynamic length, informal
  • Proposition A.1: Unbiased Map Tilting with reward-modified vector field
  • proof
  • Corollary A.2: Unbiased Map Tilting with reward-modified vector field from flow map
  • proof
  • Remark A.3: Setting $\chi_t = 0$ in \ref{['prop:map:tilt:reward']} and \ref{['cor:unbiased_flow_map']}
  • Lemma A.4: Simulating the local tilt dynamics
  • proof
  • ...and 7 more