Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim, Hao Tang, Fisher Yu
TL;DR
The paper addresses slow sampling in diffusion models by proposing Distilled-ODE solvers (D-ODE solvers), a lightweight distillation method that adds a single parameter to existing ODE solvers to better approximate the denoising outputs along the sampling trajectory. By distilling knowledge from teachers with larger steps to students with smaller steps, D-ODE solvers achieve higher-quality samples at low NFEs with negligible overhead, applicable to both noise- and data-prediction networks. Across multiple datasets and samplers (e.g., DDIM, iPNDM, DPM-Solver, DEIS, EDM), D-ODE solvers consistently improve FID at smaller NFEs and align closely with the target ODE trajectory, highlighting practical speedups without extensive retraining. Limitations include potential insufficient expressivity with a single scalar parameter for very high-resolution generation, suggesting future work on multi-parameter or localized adaptations.
Abstract
Abstract Diffusion models have recently gained prominence as a novel category of generative models. Despite their success, these models face a notable drawback in terms of slow sampling speeds, requiring a high number of function evaluations (NFE) in the order of hundreds or thousands. In response, both learning-free and learning-based sampling strategies have been explored to expedite the sampling process. Learning-free sampling employs various ordinary differential equation (ODE) solvers based on the formulation of diffusion ODEs. However, it encounters challenges in faithfully tracking the true sampling trajectory, particularly for small NFE. Conversely, learning-based sampling methods, such as knowledge distillation, demand extensive additional training, limiting their practical applicability. To overcome these limitations, we introduce Distilled-ODE solvers (D-ODE solvers), a straightforward distillation approach grounded in ODE solver formulations. Our method seamlessly integrates the strengths of both learning-free and learning-based sampling. D-ODE solvers are constructed by introducing a single parameter adjustment to existing ODE solvers. Furthermore, we optimize D-ODE solvers with smaller steps using knowledge distillation from ODE solvers with larger steps across a batch of samples. Comprehensive experiments demonstrate the superior performance of D-ODE solvers compared to existing ODE solvers, including DDIM, PNDM, DPM-Solver, DEIS, and EDM, particularly in scenarios with fewer NFE. Notably, our method incurs negligible computational overhead compared to previous distillation techniques, facilitating straightforward and rapid integration with existing samplers. Qualitative analysis reveals that D-ODE solvers not only enhance image quality but also faithfully follow the target ODE trajectory.
