Physics Informed Distillation for Diffusion Models
Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja Gowda, Chanwoo Kim, Chang D. Yoo
TL;DR
The paper tackles the slow sampling of diffusion models by reframing the teacher as a probability-flow ODE and training a student to learn the trajectory $\mathbf{x}_{\theta}(\mathbf{z}, t)$ in a physics-informed manner. By adopting a PINN-inspired residual loss, a stable boundary-preserving parametrization, and numerical differentiation, PID enables fast, single-step generation without synthetic data, and provides theoretical bounds linking discretization to trajectory accuracy. Empirical results on CIFAR-10 and ImageNet 64x64 show PID achieving competitive FID/IS with single-step sampling, albeit with higher training cost than some data-intensive distillation methods; the approach excels in not requiring synthetic data or heavy hyperparameter tuning. Overall, PID offers a practical, data-free distillation pathway for diffusion models with predictable behavior across discretization settings and a transparent training objective grounded in the underlying ODE dynamics.
Abstract
Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled the intrinsic link between diffusion models and Probability Flow Ordinary Differential Equations (ODEs), thus enabling us to conceptualize diffusion models as ODE systems. Simultaneously, Physics Informed Neural Networks (PINNs) have substantiated their effectiveness in solving intricate differential equations through implicit modeling of their solutions. Building upon these foundational insights, we introduce Physics Informed Distillation (PID), which employs a student model to represent the solution of the ODE system corresponding to the teacher diffusion model, akin to the principles employed in PINNs. Through experiments on CIFAR 10 and ImageNet 64x64, we observe that PID achieves performance comparable to recent distillation methods. Notably, it demonstrates predictable trends concerning method-specific hyperparameters and eliminates the need for synthetic dataset generation during the distillation process. Both of which contribute to its easy-to-use nature as a distillation approach for Diffusion Models. Our code and pre-trained checkpoint are publicly available at: https://github.com/pantheon5100/pid_diffusion.git.
