Table of Contents
Fetching ...

Improving Training-free Conditional Diffusion Model via Fisher Information

Kaiyu Song, Hanjiang Lai

TL;DR

The upper bound of the Fisher information is proposed to reformulate the conditional term, which increases the information gain and decreases the time cost and demonstrates that the proposed FICD can improve the generation quality in various tasks compared to the baselines with a low computation cost.

Abstract

Training-free conditional diffusion models have received great attention in conditional image generation tasks. However, they require a computationally expensive conditional score estimator to let the intermediate results of each step in the reverse process toward the condition, which causes slow conditional generation. In this paper, we propose a novel Fisher information-based conditional diffusion (FICD) model to generate high-quality samples according to the condition. In particular, we further explore the conditional term from the perspective of Fisher information, where we show Fisher information can act as a weight to measure the informativeness of the condition in each generation step. According to this new perspective, we can control and gain more information along the conditional direction in the generation space. Thus, we propose the upper bound of the Fisher information to reformulate the conditional term, which increases the information gain and decreases the time cost. Experimental results also demonstrate that the proposed FICD can offer up to 2x speed-ups under the same sampling steps as most baselines. Meanwhile, FICD can improve the generation quality in various tasks compared to the baselines with a low computation cost.

Improving Training-free Conditional Diffusion Model via Fisher Information

TL;DR

The upper bound of the Fisher information is proposed to reformulate the conditional term, which increases the information gain and decreases the time cost and demonstrates that the proposed FICD can improve the generation quality in various tasks compared to the baselines with a low computation cost.

Abstract

Training-free conditional diffusion models have received great attention in conditional image generation tasks. However, they require a computationally expensive conditional score estimator to let the intermediate results of each step in the reverse process toward the condition, which causes slow conditional generation. In this paper, we propose a novel Fisher information-based conditional diffusion (FICD) model to generate high-quality samples according to the condition. In particular, we further explore the conditional term from the perspective of Fisher information, where we show Fisher information can act as a weight to measure the informativeness of the condition in each generation step. According to this new perspective, we can control and gain more information along the conditional direction in the generation space. Thus, we propose the upper bound of the Fisher information to reformulate the conditional term, which increases the information gain and decreases the time cost. Experimental results also demonstrate that the proposed FICD can offer up to 2x speed-ups under the same sampling steps as most baselines. Meanwhile, FICD can improve the generation quality in various tasks compared to the baselines with a low computation cost.
Paper Structure (9 sections, 1 theorem, 11 equations, 6 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 1 theorem, 11 equations, 6 figures, 3 tables, 1 algorithm.

Key Result

Theorem 1

Given the sequence $\{\bm{x}_{T},\bm{x}_{T-1},...,\bm{x}_{t},...,\bm{x}_{1}\}$, where $t \in [T, 0)$ and $\bm{x}_{T}$ is the initial state of the reverse process, the $I(\bm{x}_{t})$ is bounded to the Cram$\acute{e}$r-Rao bound:

Figures (6)

  • Figure 1: An empirical study for gradient norm based on the style guidance tasks with stable diffusion. We show the value of the $||\nabla_{\bm{x}_{t}}\log p(\bm{c}|\bm{\hat{x}}_{0|t})||_2$ among different timesteps to show the information gain. Concretely (a), (b), (c), and (d) report the values under 200, 100, 50, and 30 sampling steps respectively.
  • Figure 2: Qualitative examples of using a single condition human face images. The included conditions are (a) text, (b) face parsing maps, and (c) sketches. We compare the results with those of three baselines. It can be found that MPGD is invalid since these tasks break the linear hypothesis theory, and FICD performs well.
  • Figure 3: The comparison between FICD and $||\nabla_{\bm{x}_{t}}\log p(\bm{c}|\bm{\hat{x}}_{0|t})||_2$. (a) and (c) shows the value of the gradient norm between $\nabla_{\bm{x}_{t}}\log p(\bm{c}|\bm{\hat{x}}_{0|t})$ and FICD under $T=200$ and $T=50$ respectively. (b) and (d) is the sub-view for (a) and (c) repetitively.
  • Figure 4: Qualitative examples of style-guided generation with Stable Diffusion experiment based on FICD compared with the three baselines.
  • Figure 5: Qualitative examples of face-related tasks based on ControlNet experiments. We compared FICD with FreeDom and MPGD.
  • ...and 1 more figures

Theorems & Definitions (2)

  • Definition 1
  • Theorem 1