Table of Contents
Fetching ...

Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling

Sooyoung Ryu, Mathieu Salzmann, Saqib Javed

Abstract

Post-training quantization (PTQ) is a practical path to deploy large diffusion models, but quantization noise can accumulate over the denoising trajectory and degrade generation quality. We propose Q-Drift, a principled sampler-side correction that treats quantization error as an implicit stochastic perturbation on each denoising step and derives a marginal-distribution-preserving drift adjustment. Q-Drift estimates a timestep-wise variance statistic from calibration, in practice requiring as few as 5 paired full-precision/quantized calibration runs. The resulting sampler correction is plug-and-play with common samplers, diffusion models, and PTQ methods, while incurring negligible overhead at inference. Across six diverse text-to-image models (spanning DiT and U-Net), three samplers (Euler, flow-matching, DPM-Solver++), and two PTQ methods (SVDQuant, MixDQ), Q-Drift improves FID over the corresponding quantized baseline in most settings, with up to 4.59 FID reduction on PixArt-Sigma (SVDQuant W3A4), while preserving CLIP scores.

Q-Drift: Quantization-Aware Drift Correction for Diffusion Model Sampling

Abstract

Post-training quantization (PTQ) is a practical path to deploy large diffusion models, but quantization noise can accumulate over the denoising trajectory and degrade generation quality. We propose Q-Drift, a principled sampler-side correction that treats quantization error as an implicit stochastic perturbation on each denoising step and derives a marginal-distribution-preserving drift adjustment. Q-Drift estimates a timestep-wise variance statistic from calibration, in practice requiring as few as 5 paired full-precision/quantized calibration runs. The resulting sampler correction is plug-and-play with common samplers, diffusion models, and PTQ methods, while incurring negligible overhead at inference. Across six diverse text-to-image models (spanning DiT and U-Net), three samplers (Euler, flow-matching, DPM-Solver++), and two PTQ methods (SVDQuant, MixDQ), Q-Drift improves FID over the corresponding quantized baseline in most settings, with up to 4.59 FID reduction on PixArt-Sigma (SVDQuant W3A4), while preserving CLIP scores.
Paper Structure (56 sections, 48 equations, 6 figures, 4 tables, 1 algorithm)

This paper contains 56 sections, 48 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Visual comparisons on SDXL (SVDQuant W3A4). For readability, prompt texts are provided in the supplementary material. Zoom is shown only for rows where local detail comparison is informative.
  • Figure 2: Sample-efficiency of the correction factor. Per-timestep correction factor $c_i$ (Eq. \ref{['eq:method:drift_scale']}) for SDXL (SVDQuant W3A4, 30 steps). The solid curve is the reference estimate from the standard 5K calibration run. For each calibration size $K\in\{50,10,5,1\}$, the shaded band shows the min--max envelope over 200 nested subsamples, and the dashed line shows the median. Channel-wise values are averaged into a single scalar for visualization.
  • Figure 3: Late-timestep growth of $a_t$ on SDXL (SVDQuant W3A4), shown as a channel-wise average.
  • Figure 4: Empirical validation of marginal and joint Gaussianity on SDXL (SVDQuant W3A4). Top: Histograms of $\hat{\epsilon}_{\theta}^{(t)}$ and $\Delta\epsilon_{\theta}^{(t)}$ at a latent coordinate, overlaid with fitted Gaussian curves. Bottom: The corresponding 2D joint densities with fitted Gaussian ellipses. All 1D and 2D visualizations are shown at the same latent coordinate, $(c,h,w)=(0,64,64)$.
  • Figure 5: Empirical validation of the diagonal-covariance simplification. For each selected timestep, we compare the distribution of absolute correlations from 10,000 random off-diagonal pairs, with each correlation estimated over 5,000 calibration samples, against a shuffled baseline. The shuffled baseline is obtained by randomly permuting a variable in each pair. In each panel, the inset reports the mean, median (med), and 95th percentile (p95) of the absolute-correlation distribution, listed as actual vs shuffled. Top: off-diagonal entries of $\Sigma_{\hat{\epsilon}\hat{\epsilon}}^{(t)}$. Middle: off-diagonal entries of $\Sigma_{\Delta\Delta}^{(t)}$. Bottom: off-diagonal cross-block entries of $\Sigma_{\hat{\epsilon}\Delta}^{(t)}$ for $i\neq j$.
  • ...and 1 more figures