Table of Contents
Fetching ...

Diffusion Models without Classifier-free Guidance

Zhicong Tang, Jianmin Bao, Dong Chen, Baining Guo

TL;DR

This work tackles the inefficiencies of classifier-free guidance (CFG) in conditional diffusion models by introducing Model-guidance (MG), a joint-score learning objective that treats the model as an implicit classifier. By decomposing the joint score $\nabla_{x_t}\log \tilde{p}_\theta(x_t|c)$ into terms for $\nabla_{x_t}\log p_\theta(x_t|c)$ and $\nabla_{x_t}\log p_\theta(c|x_t)$ and then approximating the latter via Bayes’ rule, MG directly learns the score of the combined distribution with a single network and a single forward path per denoising step. The MG loss adjusts the standard diffusion objective using $\epsilon' = \epsilon + w \cdot \text{sg}(\tilde{\epsilon}_\theta(x_t,t,c) - \tilde{\epsilon}_\theta(x_t,t,\emptyset))$ and leverages stop-gradient and EMA for stability, with optional automatic tuning of $w$ and the possibility to feed $w$ as an input. Empirically, MG delivers substantial training and inference speedups, preserves or improves sample quality, and achieves state-of-the-art FID $= $ $1.34$ on ImageNet $256\times256$, while scaling effectively to larger models and higher-resolution data.

Abstract

This paper presents Model-guidance (MG), a novel objective for training diffusion model that addresses and removes of the commonly used Classifier-free guidance (CFG). Our innovative approach transcends the standard modeling of solely data distribution to incorporating the posterior probability of conditions. The proposed technique originates from the idea of CFG and is easy yet effective, making it a plug-and-play module for existing models. Our method significantly accelerates the training process, doubles the inference speed, and achieve exceptional quality that parallel and even surpass concurrent diffusion models with CFG. Extensive experiments demonstrate the effectiveness, efficiency, scalability on different models and datasets. Finally, we establish state-of-the-art performance on ImageNet 256 benchmarks with an FID of 1.34. Our code is available at https://github.com/tzco/Diffusion-wo-CFG.

Diffusion Models without Classifier-free Guidance

TL;DR

This work tackles the inefficiencies of classifier-free guidance (CFG) in conditional diffusion models by introducing Model-guidance (MG), a joint-score learning objective that treats the model as an implicit classifier. By decomposing the joint score into terms for and and then approximating the latter via Bayes’ rule, MG directly learns the score of the combined distribution with a single network and a single forward path per denoising step. The MG loss adjusts the standard diffusion objective using and leverages stop-gradient and EMA for stability, with optional automatic tuning of and the possibility to feed as an input. Empirically, MG delivers substantial training and inference speedups, preserves or improves sample quality, and achieves state-of-the-art FID on ImageNet , while scaling effectively to larger models and higher-resolution data.

Abstract

This paper presents Model-guidance (MG), a novel objective for training diffusion model that addresses and removes of the commonly used Classifier-free guidance (CFG). Our innovative approach transcends the standard modeling of solely data distribution to incorporating the posterior probability of conditions. The proposed technique originates from the idea of CFG and is easy yet effective, making it a plug-and-play module for existing models. Our method significantly accelerates the training process, doubles the inference speed, and achieve exceptional quality that parallel and even surpass concurrent diffusion models with CFG. Extensive experiments demonstrate the effectiveness, efficiency, scalability on different models and datasets. Finally, we establish state-of-the-art performance on ImageNet 256 benchmarks with an FID of 1.34. Our code is available at https://github.com/tzco/Diffusion-wo-CFG.

Paper Structure

This paper contains 14 sections, 18 equations, 8 figures, 8 tables, 1 algorithm.

Figures (8)

  • Figure 1: We propose Model-guidance (MG), removing Classifier-free guidance (CFG) for diffusion models and achieving state-of-the-art on ImageNet with FID of $\mathbf{1.34}$. (a) Instead of running models twice during inference (green and red), MG directly learns the final distribution (blue). (b) MG requires only one line of code modification while providing excellent improvements. (c) Comparing to concurrent methods, MG yields lowest FID even without CFG.
  • Figure 1: (a) Unconditional, Conditional, and Classifier-free Guided score.
  • Figure 2: We use a grid 2D distribution with two classes, marked with orange and gray regions, as example and train diffusion models on it. We plot the generated samples, trajectories, and probability density function (PDF) of conditional, unconditional, CFG-guided model, and our approach. (a) The first row indicates that although CFG improves quality by eliminating outliers, the samples concentrate in the center of data distributions, resulting the loss of diversity. In contrast, our method yields less outliers than the conditional model and a better coverage of data than CFG. (b) In the second row, the trajectories of CFG show sharp turns at the beginning, e.g. samples inside the red box, while our method directly drives the samples to the closet data distributions. (c) The PDF plots of the last row also suggest that our method predicts more symmetric contours than CFG, balancing both quality and diversity.
  • Figure 4: FID-50K and Inception Score results as the guidance scale increases during inference. Our method is compatible with and can be wrapped into vanilla CFG.
  • Figure 5: FID-5K results during training. Our method is $\ge6.5\times$ faster and $\approx60\%$ better than vanilla DiT and SiT, even surpassing the results of CFG.
  • ...and 3 more figures