Table of Contents
Fetching ...

Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction

Soochul Park, Yeon Ju Lee

TL;DR

Dual-Solver is introduced, which generalizes multistep samplers through learnable parameters that continuously interpolate among prediction types, select the integration domain, and adjust the residual terms to improve FID and CLIP scores in the low-NFE regime.

Abstract

Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation (SANA, PixArt-$α$), Dual-Solver improves FID and CLIP scores in the low-NFE regime ($3 \le$ NFE $\le 9$) across backbones.

Dual-Solver: A Generalized ODE Solver for Diffusion Models with Dual Prediction

TL;DR

Dual-Solver is introduced, which generalizes multistep samplers through learnable parameters that continuously interpolate among prediction types, select the integration domain, and adjust the residual terms to improve FID and CLIP scores in the low-NFE regime.

Abstract

Diffusion models achieve state-of-the-art image quality. However, sampling is costly at inference time because it requires a large number of function evaluations (NFEs). To reduce NFEs, classical ODE numerical methods have been adopted. Yet, the choice of prediction type and integration domain leads to different sampling behaviors. To address these issues, we introduce Dual-Solver, which generalizes multistep samplers through learnable parameters that continuously (i) interpolate among prediction types, (ii) select the integration domain, and (iii) adjust the residual terms. It retains the standard predictor-corrector structure while preserving second-order local accuracy. These parameters are learned via a classification-based objective using a frozen pretrained classifier (e.g., MobileNet or CLIP). For ImageNet class-conditional generation (DiT, GM-DiT) and text-to-image generation (SANA, PixArt-), Dual-Solver improves FID and CLIP scores in the low-NFE regime ( NFE ) across backbones.
Paper Structure (65 sections, 3 theorems, 54 equations, 20 figures, 11 tables, 2 algorithms)

This paper contains 65 sections, 3 theorems, 54 equations, 20 figures, 11 tables, 2 algorithms.

Key Result

Theorem C.1

Assume that ${\bm{x}}_\theta(u)$ and $\bm\epsilon_\theta(v)$ are $C^1$ on $[u_i,u_{i+1}]$ and $[v_i,v_{i+1}]$, respectively. Let ${\bm{x}}^{\text{exact}}_{t_{i+1}}$ denote the exact update in equation eq:invertible_transform_integral, and let ${\bm{x}}^{\text{1st-pred.}}_{t_{i+1}}$ denote the first–

Figures (20)

  • Figure 1: Sampling results. SANA xie2024sana, NFE=3, CFG=4.5. See Fig. \ref{['fig:additional_sana']} for further results.
  • Figure 2: Euler updates for noise, velocity, and data predictions.
  • Figure 3: Learned parameters. Values of $\{\gamma,\tau_u,\tau_v,\kappa_u,\kappa_v\}$ across sampling steps, learned on DiT peebles2023scalable at NFE = 5. See Figs. \ref{['fig:learned_dit']}, \ref{['fig:learned_gmdit']}, \ref{['fig:learned_sana']}, and \ref{['fig:learned_pixart']} for further results.
  • Figure 4: Solver parameter learning methods. It schematically illustrates trajectory, sample, and feature regression, as well as soft- and hard-label classification methods.
  • Figure 5: Main quantitative results. FID and CLIP score; evaluated on 50k (DiT/GM-DiT) and 30k (SANA/PixArt-$\alpha$) samples; CFG: DiT=1.5, GM-DiT=1.4, SANA=4.5, PixArt-$\alpha$=3.5.
  • ...and 15 more figures

Theorems & Definitions (8)

  • Theorem C.1
  • proof
  • Theorem C.2
  • proof
  • Theorem C.3
  • proof
  • Definition G.1: Linear interpolation
  • Definition G.2: Averaged linear interpolation