Table of Contents
Fetching ...

Greed is Good: A Unifying Perspective on Guided Generation

Zander W. Blasingame, Chen Liu

TL;DR

This work shows that these two seemingly separate families of techniques for gradient-based guidance can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance and shows a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients.

Abstract

Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.

Greed is Good: A Unifying Perspective on Guided Generation

TL;DR

This work shows that these two seemingly separate families of techniques for gradient-based guidance can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance and shows a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients.

Abstract

Training-free guided generation is a widely used and powerful technique that allows the end user to exert further control over the generative process of flow/diffusion models. Generally speaking, two families of techniques have emerged for solving this problem for gradient-based guidance: namely, posterior guidance (i.e., guidance via projecting the current sample to the target distribution via the target prediction model) and end-to-end guidance (i.e., guidance by performing backpropagation throughout the entire ODE solve). In this work, we show that these two seemingly separate families can actually be unified by looking at posterior guidance as a greedy strategy of end-to-end guidance. We explore the theoretical connections between these two families and provide an in-depth theoretical of these two techniques relative to the continuous ideal gradients. Motivated by this analysis we then show a method for interpolating between these two families enabling a trade-off between compute and accuracy of the guidance gradients. We then validate this work on several inverse image problems and property-guided molecular generation.

Paper Structure

This paper contains 75 sections, 26 theorems, 115 equations, 16 figures, 5 tables.

Key Result

Proposition 0

Given an initial value of ${\bm{x}}_s$ at time $s \in [0, 1]$ the solution ${\bm{x}}_t$ at time $t \in [0,1]$ of an ODE governed by the vector field in eq:marginal_vec is:

Figures (16)

  • Figure 1: The greedy perspective as a unification of separate families in the taxonomy of training-free guided generation. We provide a more detailed version of this in \ref{['fig:app:taxonomy_of_guided']}.
  • Figure 2: Visual comparison of different training-free guided generation techniques.
  • Figure 3: Qualitative visualization of using posterior guidance to solve an inverse problem on the task of inpainting with a 70% random mask. Top row is the ground truth, middle row is the measurement, and the bottom row is the reconstruction.
  • Figure 4: Qualitative visualization of controlled generated molecules for various polarizability $(\alpha)$ levels. Top row is generated using a end-to-end guidance with a DTO scheme and the bottom row is generated using posterior guidance.
  • Figure 5: A more detailed taxonomy of training-free guided generation methods from \ref{['fig:taxonomy_of_guided']} from the main paper.
  • ...and 11 more figures

Theorems & Definitions (52)

  • Proposition 0: Exact solution of affine probability paths
  • Remark 1
  • Proposition 0: Greedy as an explicit Euler scheme within DTO
  • Proposition 0: Greedy as an implicit Euler scheme within OTD
  • proof : Proof sketch
  • Theorem 1: Jacobian matrices of affine Gaussian probability paths
  • Remark 2
  • Proposition 1: Dynamics of greedy gradient guidance
  • Remark 3
  • Theorem 2: Dynamics of gradient vs greedy guidance
  • ...and 42 more