Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation
Tianyu Chen, Yasi Zhang, Zhendong Wang, Ying Nian Wu, Oscar Leong, Mingyuan Zhou
TL;DR
This work addresses learning high-quality generative models when clean data are scarce by introducing denoising score distillation (DSD), which first pretrains a diffusion model on noisy data and then distills it into a one-step generator. The authors show that distillation can improve sample quality even when the teacher is degraded and provide a linear-theory justification that the distilled model aligns with the clean data covariance eigenspace, effectively regularizing the generator. Empirically, DSD yields strong, faster-generation performance across multiple datasets and noise levels, and the paper introduces practical tools such as Proximal FID for model selection in corrupted-data regimes. Theoretical and experimental results together suggest that noisy data, when processed via score distillation, can be more informative than previously believed, enabling robust sampling in scientific domains with limited clean data.
Abstract
Diffusion models have achieved remarkable success in generating high-resolution, realistic images across diverse natural distributions. However, their performance heavily relies on high-quality training data, making it challenging to learn meaningful distributions from corrupted samples. This limitation restricts their applicability in scientific domains where clean data is scarce or costly to obtain. In this work, we introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data. DSD first pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs. While score distillation is traditionally viewed as a method to accelerate diffusion models, we show that it can also significantly enhance sample quality, particularly when starting from a degraded teacher model. Across varying noise levels and datasets, DSD consistently improves generative performancewe summarize our empirical evidence in Fig. 1. Furthermore, we provide theoretical insights showing that, in a linear model setting, DSD identifies the eigenspace of the clean data distributions covariance matrix, implicitly regularizing the generator. This perspective reframes score distillation as not only a tool for efficiency but also a mechanism for improving generative models, particularly in low-quality data settings.
