Mean-Shift Distillation for Diffusion Mode Seeking
Vikas Thamizharasan, Nikitas Chatzis, Iliyan Georgiev, Matthew Fisher, Evangelos Kalogerakis, Difan Liu, Nanxuan Zhao, Michal Lukac
TL;DR
Mean-shift distillation (MSD) reframes diffusion distillation as mode-seeking gradient ascent on the data distribution, deriving a gradient proxy that aligns with the modes of $p$. It uses product-density sampling and a simple mean-shift update to estimate the gradient without retraining, serving as a drop-in replacement for SDS. MSD reduces gradient variance and improves mode alignment, yielding higher-fidelity results in text-to-image and text-to-3D generation with Stable Diffusion in both synthetic and practical settings. Practical heuristics stabilize integration in high-dimensional models, and CFG synergy further enhances mode-focused optimization, making MSD a practical, theoretically grounded alternative to SDS.
Abstract
We present mean-shift distillation, a novel diffusion distillation technique that provides a provably good proxy for the gradient of the diffusion output distribution. This is derived directly from mean-shift mode seeking on the distribution, and we show that its extrema are aligned with the modes. We further derive an efficient product distribution sampling procedure to evaluate the gradient. Our method is formulated as a drop-in replacement for score distillation sampling (SDS), requiring neither model retraining nor extensive modification of the sampling procedure. We show that it exhibits superior mode alignment as well as improved convergence in both synthetic and practical setups, yielding higher-fidelity results when applied to both text-to-image and text-to-3D applications with Stable Diffusion.
