Table of Contents
Fetching ...

DyWeight: Dynamic Gradient Weighting for Few-Step Diffusion Sampling

Tong Zhao, Mingkun Lei, Liangyu Yuan, Yanming Yang, Chenxi Song, Yang Wang, Beier Zhu, Chi Zhang

Abstract

Diffusion Models (DMs) have achieved state-of-the-art generative performance across multiple modalities, yet their sampling process remains prohibitively slow due to the need for hundreds of function evaluations. Recent progress in multi-step ODE solvers has greatly improved efficiency by reusing historical gradients, but existing methods rely on handcrafted coefficients that fail to adapt to the non-stationary dynamics of diffusion sampling. To address this limitation, we propose Dynamic Gradient Weighting (DyWeight), a lightweight, learning-based multi-step solver that introduces a streamlined implicit coupling paradigm. By relaxing classical numerical constraints, DyWeight learns unconstrained time-varying parameters that adaptively aggregate historical gradients while intrinsically scaling the effective step size. This implicit time calibration accurately aligns the solver's numerical trajectory with the model's internal denoising dynamics under large integration steps, avoiding complex decoupled parameterizations and optimizations. Extensive experiments on CIFAR-10, FFHQ, AFHQv2, ImageNet64, LSUN-Bedroom, Stable Diffusion and FLUX.1-dev demonstrate that DyWeight achieves superior visual fidelity and stability with significantly fewer function evaluations, establishing a new state-of-the-art among efficient diffusion solvers. Code is available at https://github.com/Westlake-AGI-Lab/DyWeight

DyWeight: Dynamic Gradient Weighting for Few-Step Diffusion Sampling

Abstract

Diffusion Models (DMs) have achieved state-of-the-art generative performance across multiple modalities, yet their sampling process remains prohibitively slow due to the need for hundreds of function evaluations. Recent progress in multi-step ODE solvers has greatly improved efficiency by reusing historical gradients, but existing methods rely on handcrafted coefficients that fail to adapt to the non-stationary dynamics of diffusion sampling. To address this limitation, we propose Dynamic Gradient Weighting (DyWeight), a lightweight, learning-based multi-step solver that introduces a streamlined implicit coupling paradigm. By relaxing classical numerical constraints, DyWeight learns unconstrained time-varying parameters that adaptively aggregate historical gradients while intrinsically scaling the effective step size. This implicit time calibration accurately aligns the solver's numerical trajectory with the model's internal denoising dynamics under large integration steps, avoiding complex decoupled parameterizations and optimizations. Extensive experiments on CIFAR-10, FFHQ, AFHQv2, ImageNet64, LSUN-Bedroom, Stable Diffusion and FLUX.1-dev demonstrate that DyWeight achieves superior visual fidelity and stability with significantly fewer function evaluations, establishing a new state-of-the-art among efficient diffusion solvers. Code is available at https://github.com/Westlake-AGI-Lab/DyWeight
Paper Structure (25 sections, 27 equations, 31 figures, 14 tables, 2 algorithms)

This paper contains 25 sections, 27 equations, 31 figures, 14 tables, 2 algorithms.

Figures (31)

  • Figure 1: Qualitative comparison on FLUX.1-dev fluxflux_paper at 7 NFEs. From top to bottom, the rows show results from DPM-Solver++(2M) dpmpp, iPNDM(2M) PNDMiPNDM, and our DyWeight. Our method delivers superior visual fidelity, prompt alignment, and structural coherence.
  • Figure 2: Overview of DyWeight. Our method dynamically aggregates buffered historical gradients ($w_{t_n,j} \cdot \mathbf{v}_{\text{buffer}}$) using unconstrained learned weights (i.e., not requiring $\sum w = 1$). The unnormalized sum part adjusts the effective step size via time calibration ($t_n \rightarrow \textcolor{rgb(248,123,119)}{\tilde{t}_{n}}$). As illustrated on the right, DyWeight exhibits a distinct advantage in few-step generation scenarios.
  • Figure 3: L2 error under different complexity and steps settings. (a) Error vs. Complexity: We fix the solver steps ($S$) and increase the problem complexity (polynomial order $K$). (b) Error vs. Steps: We fix the complexity and decrease the number of solver steps.
  • Figure 4: Effect of Time Shifting on Training Stability. We compare the (a) standard deviation of a gradient term specific to an intermediate point (expected$=\!1$) and (b) training loss for a 5-NFE student solver on CIFAR-10 cifar. The solver w/o time shifting (red) exhibits extreme variance and an unstable loss. In contrast, our proposed time shifting (blue) stabilizes the gradient variance around the expected value of 1.0 and achieves a smoother, lower-variance, and faster-converging training loss.
  • Figure 5: Training convergence on CIFAR-10 (200K training images). We track training loss (left) and generation FID (right) against optimization progress under various solver configurations. As shown, our framework exhibits extreme sample efficiency, achieving massive performance gains with merely $\sim \mathbf{600}$ images.
  • ...and 26 more figures