Unveil Conditional Diffusion Models with Classifier-free Guidance: A Sharp Statistical Theory
Hengyu Fu, Zhuoran Yang, Mengdi Wang, Minshuo Chen
TL;DR
This work delivers the first sharp statistical theory for conditional diffusion models trained with classifier-free guidance, linking Hölder-smooth ground-truth conditionals to tractable, data-efficient learning. It introduces a universal conditional score-approximation framework based on diffused local polynomials, achieving rates adaptive to the data’s smoothness and, under stronger density assumptions, substantially faster convergence. Building on this, the authors establish end-to-end distribution-estimation guarantees with minimax-optimal rates, and extend the theory to model-based RL transition kernels, reward-directed generation, and linear inverse problems. The results provide rigorous foundations for the practical success of conditional diffusion methods across domains, highlighting how data regularity and coverage fundamentally shape statistical performance.
Abstract
Conditional diffusion models serve as the foundation of modern image synthesis and find extensive application in fields like computational biology and reinforcement learning. In these applications, conditional diffusion models incorporate various conditional information, such as prompt input, to guide the sample generation towards desired properties. Despite the empirical success, theory of conditional diffusion models is largely missing. This paper bridges this gap by presenting a sharp statistical theory of distribution estimation using conditional diffusion models. Our analysis yields a sample complexity bound that adapts to the smoothness of the data distribution and matches the minimax lower bound. The key to our theoretical development lies in an approximation result for the conditional score function, which relies on a novel diffused Taylor approximation technique. Moreover, we demonstrate the utility of our statistical theory in elucidating the performance of conditional diffusion models across diverse applications, including model-based transition kernel estimation in reinforcement learning, solving inverse problems, and reward conditioned sample generation.
