Table of Contents
Fetching ...

D-Flow: Differentiating through Flows for Controlled Generation

Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, Yaron Lipman

TL;DR

D-Flow introduces a training-free framework that controls generation from pre-trained diffusion/flow models by differentiating through the ODE solver with respect to the initial noise $x_0$. The key idea is that backpropagating to $x_0$ projects gradients onto data-manifold directions, embedding an implicit prior into the control objective. The approach unifies inverse problems, conditional sampling, and editing across images, audio, and molecules, achieving state-of-the-art results without task-specific retraining, albeit with longer runtimes. Theoretical support via Affine Gaussian Probability Paths and adjoint dynamics explains the implicit regularization, and extensive experiments demonstrate broad applicability and strong performance across domains. This work opens a practical, versatile path for controllable generation using pre-trained priors with minimal retraining cost.

Abstract

Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.

D-Flow: Differentiating through Flows for Controlled Generation

TL;DR

D-Flow introduces a training-free framework that controls generation from pre-trained diffusion/flow models by differentiating through the ODE solver with respect to the initial noise . The key idea is that backpropagating to projects gradients onto data-manifold directions, embedding an implicit prior into the control objective. The approach unifies inverse problems, conditional sampling, and editing across images, audio, and molecules, achieving state-of-the-art results without task-specific retraining, albeit with longer runtimes. Theoretical support via Affine Gaussian Probability Paths and adjoint dynamics explains the implicit regularization, and extensive experiments demonstrate broad applicability and strong performance across domains. This work opens a practical, versatile path for controllable generation using pre-trained priors with minimal retraining cost.

Abstract

Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general. In this work we introduce D-Flow, a simple framework for controlling the generation process by differentiating through the flow, optimizing for the source (noise) point. We motivate this framework by our key observation stating that for Diffusion/FM models trained with Gaussian probability paths, differentiating through the generation process projects gradient on the data manifold, implicitly injecting the prior into the optimization process. We validate our framework on linear and non-linear controlled generation problems including: image and audio inverse problems and conditional molecule generation reaching state of the art performance across all.
Paper Structure (40 sections, 6 theorems, 59 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 40 sections, 6 theorems, 59 equations, 10 figures, 9 tables, 1 algorithm.

Key Result

Proposition 4.1

For AGPP, the gradient of the denoiser $\hat{x}_{1|t}(x)$ w.r.t $x$ is proportional to the variance of the random variable defined by $p_t(x_1|x)$, formally: where

Figures (10)

  • Figure 1: Free-form inpainting with a latent T2I FM model (Ground truth image is taken from the MS-COCO validation set), conditionally generated molecule and audio inpainting using D-Flow.
  • Figure 2: Intermediate $x(1)$ during optimization. Given a distorted image and randomly initialized $x_0$ defining the initial $x(1)$, our optimization travels close to the natural image manifold passing through in-distribution images on its way to the GT sample from the face-blurred ImageNet-128 validation set.
  • Figure 3: BPD of two images in an ImageNet-128 model.
  • Figure 4: Implicit bias in differentiating through the solver.
  • Figure 5: Qualitative comparison for linear inverse problems on ImageNet-128. GT samples from ImageNet-128 validation.
  • ...and 5 more figures

Theorems & Definitions (10)

  • Proposition 4.1
  • Theorem 4.2
  • Proposition 1.1
  • proof
  • Theorem 1.2
  • proof
  • Lemma 1.3
  • proof
  • Proposition 1.4
  • proof