Table of Contents
Fetching ...

S4S: Solving for a Diffusion Model Solver

Eric Frankel, Sitan Chen, Jerry Li, Pang Wei Koh, Lillian J. Ratliff, Sewoong Oh

TL;DR

This work tackles the inefficiency of diffusion-model sampling when few neural-function evaluations are allowed. It introduces S4S, a training-free method that distills from a high-NFE teacher solver to learn solver coefficients that better approximate the diffusion trajectory, yielding universal sample-quality gains without data or retraining. Building on this, S4S-Alt jointly optimizes solver coefficients and discretization steps via alternating minimization, achieving up to roughly a 1.5–2× improvement in FID across several datasets with as few as 5 NFEs. The approach is lightweight, black-box compatible, and data-free, offering a practical path to faster, high-quality diffusion sampling, with potential future extensions to SDEs and sample-level adaptation.

Abstract

Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a $1.5\times$ improvement over previous training-free ODE methods.

S4S: Solving for a Diffusion Model Solver

TL;DR

This work tackles the inefficiency of diffusion-model sampling when few neural-function evaluations are allowed. It introduces S4S, a training-free method that distills from a high-NFE teacher solver to learn solver coefficients that better approximate the diffusion trajectory, yielding universal sample-quality gains without data or retraining. Building on this, S4S-Alt jointly optimizes solver coefficients and discretization steps via alternating minimization, achieving up to roughly a 1.5–2× improvement in FID across several datasets with as few as 5 NFEs. The approach is lightweight, black-box compatible, and data-free, offering a practical path to faster, high-quality diffusion sampling, with potential future extensions to SDEs and sample-level adaptation.

Abstract

Diffusion models (DMs) create samples from a data distribution by starting from random noise and iteratively solving a reverse-time ordinary differential equation (ODE). Because each step in the iterative solution requires an expensive neural function evaluation (NFE), there has been significant interest in approximately solving these diffusion ODEs with only a few NFEs without modifying the underlying model. However, in the few NFE regime, we observe that tracking the true ODE evolution is fundamentally impossible using traditional ODE solvers. In this work, we propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S). S4S directly optimizes a solver to obtain good generation quality by learning to match the output of a strong teacher solver. We evaluate S4S on six different pre-trained DMs, including pixel-space and latent-space DMs for both conditional and unconditional sampling. In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers. Moreover, our method is lightweight, data-free, and can be plugged in black-box on top of any discretization schedule or architecture to improve performance. Building on top of this, we also propose S4S-Alt, which optimizes both the solver and the discretization schedule. By exploiting the full design space of DM solvers, with 5 NFEs, we achieve an FID of 3.73 on CIFAR10 and 13.26 on MS-COCO, representing a improvement over previous training-free ODE methods.

Paper Structure

This paper contains 75 sections, 1 theorem, 39 equations, 11 figures, 16 tables, 2 algorithms.

Key Result

Theorem D.1

Let $\Psi_{*}$ and $\Psi_{\bm{\phi}}$ be a teacher and student ODE solver each with noise distribution $\mathcal{N}(0, \sigma^2_1\mathbf{I}) \in \mathbb{R}^d$, and with, respectively, distributions $q$ and $p_{\bm{\phi}}$. Assume both $\Psi_{*}$ and $\Psi_{\bm{\phi}}$ are invertible. Let $r > 0$, if where $C(\Psi_{\bm{\phi}^*}(\mathbf{x})) = \log |\det J_{\Psi_{\bm{\phi}^*}}(\Psi^{-1}_{\bm{\phi}^*

Figures (11)

  • Figure 1: High-level approach of S4S-Alt. Every diffusion solver can be characterized by its choice of step schedule and the parameters used for estimating the next point in the reverse process. In low-NFE environments, vanilla ODE solvers are unable to approximate the true diffusion ODE trajectory and produce low-quality samples. In S4S-Alt, we learn an optimal combination of solver coefficients and discretization steps that closely matches the output of the true ODE trajectory. An example of a selected ODE solver is presented on the top, where $\{t_i\}$ is the choice of step schedule and the coefficients $(1/2,1/2)$ are the solver parameters.
  • Figure 2: Generations from Stable Diffusion v1.4 with text prompt "a panda sitting in a bamboo forest". The teacher solver \ref{['fig:panda1']} uses 20 NFEs, while all other solvers \ref{['fig:panda2']}--\ref{['fig:panda4']} use 5 NFEs. Compared to the teacher's generation, the best "traditional" ODE solver introduces visual artifacts into the image, while S4S and S4S-Alt produce generations increasingly close to that of the teacher. The prompts used for generating these images are: "a panda sitting in a bamboo forest," "a dog playing with a ball in a park," "an oil painting of a soccer player playing in a stadium," and "a hamburger with a side of fries."
  • Figure 3: PCA of learned S4S coefficients at \ref{['fig:pca1']} each point of the reverse process or at \ref{['fig:pca2']} each training epoch; darker points refer to earlier values in the reverse process or training. We initialize S4S coefficients at iPNDM and learn a solver with 5 NFEs and order 3. In \ref{['fig:pca1']}, we take the PCA of the combined set of final learned coefficients $\{(b_{1,i}, b_{2,i}, b_{3,i})\}_{i=1}^5$ across the three training random seeds used. We also include the iPNDM coefficients in the PCA, using a total of 16 vectors in $\mathbb{R}^3$. In \ref{['fig:pca2']}, we concatenate the learned coefficient vectors at the end of each epoch, resulting in a vector of dimension $\mathbb{R}^{15}$ for each epoch. We again perform PCA on a collection of 16 of these vectors, again including iPNDM as a reference point.
  • Figure 4: Values of $\mathcal{L}_{\text{relax}}$ as we expand $r$. As $r$ increases, the objective becomes easier to optimize, thereby validating the utility of the relaxed objective in making an easier optimization problem for learning solver coefficients.
  • Figure 5: FID vs. Training Dataset Size in S4S-Alt.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Theorem D.1
  • proof