Diffusion Models without Classifier-free Guidance
Zhicong Tang, Jianmin Bao, Dong Chen, Baining Guo
TL;DR
This work tackles the inefficiencies of classifier-free guidance (CFG) in conditional diffusion models by introducing Model-guidance (MG), a joint-score learning objective that treats the model as an implicit classifier. By decomposing the joint score $\nabla_{x_t}\log \tilde{p}_\theta(x_t|c)$ into terms for $\nabla_{x_t}\log p_\theta(x_t|c)$ and $\nabla_{x_t}\log p_\theta(c|x_t)$ and then approximating the latter via Bayes’ rule, MG directly learns the score of the combined distribution with a single network and a single forward path per denoising step. The MG loss adjusts the standard diffusion objective using $\epsilon' = \epsilon + w \cdot \text{sg}(\tilde{\epsilon}_\theta(x_t,t,c) - \tilde{\epsilon}_\theta(x_t,t,\emptyset))$ and leverages stop-gradient and EMA for stability, with optional automatic tuning of $w$ and the possibility to feed $w$ as an input. Empirically, MG delivers substantial training and inference speedups, preserves or improves sample quality, and achieves state-of-the-art FID $= $ $1.34$ on ImageNet $256\times256$, while scaling effectively to larger models and higher-resolution data.
Abstract
This paper presents Model-guidance (MG), a novel objective for training diffusion model that addresses and removes of the commonly used Classifier-free guidance (CFG). Our innovative approach transcends the standard modeling of solely data distribution to incorporating the posterior probability of conditions. The proposed technique originates from the idea of CFG and is easy yet effective, making it a plug-and-play module for existing models. Our method significantly accelerates the training process, doubles the inference speed, and achieve exceptional quality that parallel and even surpass concurrent diffusion models with CFG. Extensive experiments demonstrate the effectiveness, efficiency, scalability on different models and datasets. Finally, we establish state-of-the-art performance on ImageNet 256 benchmarks with an FID of 1.34. Our code is available at https://github.com/tzco/Diffusion-wo-CFG.
