TFG: Unified Training-Free Guidance for Diffusion Models

Haotian Ye; Haowei Lin; Jiaqi Han; Minkai Xu; Sheng Liu; Yitao Liang; Jianzhu Ma; James Zou; Stefano Ermon

TFG: Unified Training-Free Guidance for Diffusion Models

Haotian Ye, Haowei Lin, Jiaqi Han, Minkai Xu, Sheng Liu, Yitao Liang, Jianzhu Ma, James Zou, Stefano Ermon

TL;DR

This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space, and proposes an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task.

Abstract

Given an unconditional diffusion model and a predictor for a target property of interest (e.g., a classifier), the goal of training-free guidance is to generate samples with desirable target properties without additional training. Existing methods, though effective in various individual applications, often lack theoretical grounding and rigorous testing on extensive benchmarks. As a result, they could even fail on simple tasks, and applying them to a new problem becomes unavoidably difficult. This paper introduces a novel algorithmic framework encompassing existing methods as special cases, unifying the study of training-free guidance into the analysis of an algorithm-agnostic design space. Via theoretical and empirical investigation, we propose an efficient and effective hyper-parameter searching strategy that can be readily applied to any downstream task. We systematically benchmark across 7 diffusion models on 16 tasks with 40 targets, and improve performance by 8.5% on average. Our framework and benchmark offer a solid foundation for conditional generation in a training-free manner.

TFG: Unified Training-Free Guidance for Diffusion Models

TL;DR

Abstract

Paper Structure (59 sections, 2 theorems, 29 equations, 16 figures, 11 tables, 6 algorithms)

This paper contains 59 sections, 2 theorems, 29 equations, 16 figures, 11 tables, 6 algorithms.

Introduction
Background
Existing algorithms
TFG: A Unified Framework for Training-free Guidance
Unification and extension
Algorithm and design space analysis
Design Space of TFG: Analysis and Searching Strategy
Benchmarking
Settings
Benchmarking results
Discussions and Limitations
The Motivation of Studying Training-free Guidance
Failure case of image generation with GPT4
It's hard for GPT4 to understand targets.
It's hard for GPT4 to capture the targeted distribution.
...and 44 more sections

Key Result

Theorem 3.2

The hyper-parameter space of

Figures (16)

Figure 1: (a) Illustration of the unified search space of our proposed TFG, where the height (color) stands for performance. Existing algorithms search along sub-manifolds, while TFG results in improved guidance thanks to its extended search space. (b) The label accuracy (higher the better) and Fréchet inception distance (FID, lower the better) of different methods for the label guidance task on CIFAR10 cifar10, averaged across ten labels. Ours (TFG-4) performs much closer to training-based methods. (c$\sim$h) TFG generated samples across various tasks in vision, audio, and geometry domains.
Figure 2: Comparison of three structures in \ref{['eq:scheduler']} of $\bm \rho$ and $\bm \mu$ on CIFAR10 and ImageNet, under different choices of the rest hyper-parameters in $\mathcal{H}_{\text{TFG}}\xspace$. We set $\bm \rho = \bm 0, \bar{\gamma} = 0$ when studying structures of $\bm \mu$, and similarly for $\bm \rho$. Results are averaged across all labels. The comparative relationship between structures remains unchanged when the rest of the parameters vary.
Figure 3: Accuracy and FID on CIFAR10 under different $N_\text{recur}$ and $N_\text{iter}$. $s_\rho(t), s_\mu(t)$ are fixed to "increase" structure, and $\bm \rho=\bm \gamma=0$.
Figure 4: Prompting GPT4 to generate property guided molecules. It is hard for the image generator to understand the target and generate faithful samples. In this dialog, GPT4 claims to generate a benzene molecule but the sample is apparently not a benzene. There are also many invalid carbon atoms with more than 4 bonds and the polarizability target is not achieved.
Figure 5: Prompting GPT4 to generate CelebA-like images. We first prompt ChatGPT to probe its knowledge of CelebA dataset and then ask it to generate a young man figure in CelebA style. However, the generated figure is apprently not in the distribution of CelebA (refer to \ref{['fig:youngman']}) for comparison.
...and 11 more figures

Theorems & Definitions (5)

Definition 3.1
Theorem 3.2
Lemma 3.3
proof
proof

TFG: Unified Training-Free Guidance for Diffusion Models

TL;DR

Abstract

TFG: Unified Training-Free Guidance for Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (5)