Table of Contents
Fetching ...

Feedback Guidance of Diffusion Models

Felix Koulischer, Florian Handke, Johannes Deleu, Thomas Demeester, Luca Ambrogioni

TL;DR

Classifier-Free Guidance (CFG) improves diffusion-based conditional generation but can hurt diversity when applied uniformly. This work introduces FeedBack Guidance (FBG), a state- and time-dependent guidance mechanism derived from an additive error model and posterior tracking, enabling dynamic adjustment of guidance during inference. Empirical results show FBG outperforms CFG on ImageNet-512×512 and remains competitive with Limited Interval Guidance, while automatically increasing guidance for complex Text-To-Image prompts and remaining compatible with CFG and LIG. The approach provides a solid theoretical framework for adaptive guidance and demonstrates practical benefits across image and text-to-image generation tasks.

Abstract

While Classifier-Free Guidance (CFG) has become standard for improving sample fidelity in conditional diffusion models, it can harm diversity and induce memorization by applying constant guidance regardless of whether a particular sample needs correction. We propose FeedBack Guidance (FBG), which uses a state-dependent coefficient to self-regulate guidance amounts based on need. Our approach is derived from first principles by assuming the learned conditional distribution is linearly corrupted by the unconditional distribution, contrasting with CFG's implicit multiplicative assumption. Our scheme relies on feedback of its own predictions about the conditional signal informativeness to adapt guidance dynamically during inference, challenging the view of guidance as a fixed hyperparameter. The approach is benchmarked on ImageNet512x512, where it significantly outperforms Classifier-Free Guidance and is competitive to Limited Interval Guidance (LIG) while benefitting from a strong mathematical framework. On Text-To-Image generation, we demonstrate that, as anticipated, our approach automatically applies higher guidance scales for complex prompts than for simpler ones and that it can be easily combined with existing guidance schemes such as CFG or LIG.

Feedback Guidance of Diffusion Models

TL;DR

Classifier-Free Guidance (CFG) improves diffusion-based conditional generation but can hurt diversity when applied uniformly. This work introduces FeedBack Guidance (FBG), a state- and time-dependent guidance mechanism derived from an additive error model and posterior tracking, enabling dynamic adjustment of guidance during inference. Empirical results show FBG outperforms CFG on ImageNet-512×512 and remains competitive with Limited Interval Guidance, while automatically increasing guidance for complex Text-To-Image prompts and remaining compatible with CFG and LIG. The approach provides a solid theoretical framework for adaptive guidance and demonstrates practical benefits across image and text-to-image generation tasks.

Abstract

While Classifier-Free Guidance (CFG) has become standard for improving sample fidelity in conditional diffusion models, it can harm diversity and induce memorization by applying constant guidance regardless of whether a particular sample needs correction. We propose FeedBack Guidance (FBG), which uses a state-dependent coefficient to self-regulate guidance amounts based on need. Our approach is derived from first principles by assuming the learned conditional distribution is linearly corrupted by the unconditional distribution, contrasting with CFG's implicit multiplicative assumption. Our scheme relies on feedback of its own predictions about the conditional signal informativeness to adapt guidance dynamically during inference, challenging the view of guidance as a fixed hyperparameter. The approach is benchmarked on ImageNet512x512, where it significantly outperforms Classifier-Free Guidance and is competitive to Limited Interval Guidance (LIG) while benefitting from a strong mathematical framework. On Text-To-Image generation, we demonstrate that, as anticipated, our approach automatically applies higher guidance scales for complex prompts than for simpler ones and that it can be easily combined with existing guidance schemes such as CFG or LIG.

Paper Structure

This paper contains 31 sections, 29 equations, 19 figures, 5 tables, 1 algorithm.

Figures (19)

  • Figure 1: Illustrative diffusion trajectories and their hypothetical guidance scales in a 1D setting. Trajectories farther from the mode near the decision window (red, orange) receive stronger guidance, whereas those clearly heading toward the right mode (yellow) receive negligible guidance.
  • Figure 2: Schematic of Feedback guidance (FBG). The state space consists of both $\bm{x}_t$ and $\lambda$, which are updated iteratively during the denoising process. The guidance scale is updated by tracking the posterior ratio thanks to Eq. (\ref{['eq: posterior as means markov chain with Lin Tf']}), which can then be inserted in Eq. (\ref{['eq: guidscale as function of posterior']}).
  • Figure 3: (a) Grid search over $t_0$ and $t_1$, with FD$_{\text{DinoV2}}$ calibrated to the best value among CFG, LIG, and FBG. (b) Precision–Recall sweeps at each method’s FD$_{\text{DinoV2}}$ optimum: CFG/LIG sweep guidance scale, FBG sweeps $t_0$ at fixed $t_0-t_1$. Guidance strength is indicated by color intensity.
  • Figure 4: Analysis of FBG in the context of T2I. In (a) the dynamic guidance scale of 32 samples are shown using two prompts: a memorized one ( "The starry night by Van Gogh") and a more difficult one ( "A chameleon blending into a graffiti-covered wall"). In (b) the average guidance scale applied when using FBG is shown as a function of various prompt difficulties specified in Appendix \ref{['Appendix: Prompt dataset']}.
  • Figure 5: Guidance scale for different trajectories using the prompt: "A snail crawling on a green leaf with water droplets". If the conditional prediction is good the guidance is low (top two images). In contrast when the conditional prediction is poor, the guidance scale increases (bottom two images).
  • ...and 14 more figures