Table of Contents
Fetching ...

TFG: Unified Training-Free Guidance for Diffusion Models

Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, Stefano Ermon

TL;DR

This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space, and proposes an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task.

Abstract

Given an unconditional diffusion model and a predictor for a target property of interest (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. As a result, they could even fail on simple tasks, and applying them to a new problem becomes unavoidably difficult. This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space. Via theoretical and empirical investigation, we propose an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task. We systematically benchmark across 7 diffusion models on 16 tasks with 40 targets, and improve performance by 8.5% on average. Our framework and benchmark offer a solid foundation for conditional generation in a training-free manner.

TFG: Unified Training-Free Guidance for Diffusion Models

TL;DR

This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space, and proposes an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task.

Abstract

Given an unconditional diffusion model and a predictor for a target property of interest (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. As a result, they could even fail on simple tasks, and applying them to a new problem becomes unavoidably difficult. This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space. Via theoretical and empirical investigation, we propose an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task. We systematically benchmark across 7 diffusion models on 16 tasks with 40 targets, and improve performance by 8.5% on average. Our framework and benchmark offer a solid foundation for conditional generation in a training-free manner.
Paper Structure (59 sections, 2 theorems, 29 equations, 16 figures, 11 tables, 6 algorithms)

This paper contains 59 sections, 2 theorems, 29 equations, 16 figures, 11 tables, 6 algorithms.

Key Result

Theorem 3.2

The hyper-parameter space of

Figures (16)

  • Figure 1: (a) Illustration of the unified search space of our proposed TFG, where the height (color) stands for performance. Existing algorithms search along sub-manifolds, while TFG results in improved guidance thanks to its extended search space. (b) The label accuracy (higher the better) and Fréchet inception distance (FID, lower the better) of different methods for the label guidance task on CIFAR10 cifar10, averaged across ten labels. Ours (TFG-4) performs much closer to training-based methods. (c$\sim$h) TFG generated samples across various tasks in vision, audio, and geometry domains.
  • Figure 2: Comparison of three structures in \ref{['eq:scheduler']} of $\bm \rho$ and $\bm \mu$ on CIFAR10 and ImageNet, under different choices of the rest hyper-parameters in $\mathcal{H}_{\text{TFG}}\xspace$. We set $\bm \rho = \bm 0, \bar{\gamma} = 0$ when studying structures of $\bm \mu$, and similarly for $\bm \rho$. Results are averaged across all labels. The comparative relationship between structures remains unchanged when the rest of the parameters vary.
  • Figure 3: Accuracy and FID on CIFAR10 under different $N_\text{recur}$ and $N_\text{iter}$. $s_\rho(t), s_\mu(t)$ are fixed to "increase" structure, and $\bm \rho=\bm \gamma=0$.
  • Figure 4: Prompting GPT4 to generate property guided molecules. It is hard for the image generator to understand the target and generate faithful samples. In this dialog, GPT4 claims to generate a benzene molecule but the sample is apparently not a benzene. There are also many invalid carbon atoms with more than 4 bonds and the polarizability target is not achieved.
  • Figure 5: Prompting GPT4 to generate CelebA-like images. We first prompt ChatGPT to probe its knowledge of CelebA dataset and then ask it to generate a young man figure in CelebA style. However, the generated figure is apprently not in the distribution of CelebA (refer to \ref{['fig:youngman']}) for comparison.
  • ...and 11 more figures

Theorems & Definitions (5)

  • Definition 3.1
  • Theorem 3.2
  • Lemma 3.3
  • proof
  • proof