Optimizing Input of Denoising Score Matching is Biased Towards Higher Score Norm
Tongda Xu
TL;DR
The paper investigates a bias in denoising score matching when used to optimize targets beyond model parameters, such as the conditional input $c$ or the input distribution $p(x)$. It derives the relationship between $\mathcal{L}_{DSM}$ and $\mathcal{L}_{ESM}$ for these settings and shows a bias term that pushes toward higher score norms, also extending the result to pre-trained diffusion models. The bias persists across conditional-input and data-distribution optimization and is argued to be relevant to a broad range of diffusion-based works, including MAR, DreamFusion, and PerCo. These findings urge caution and motivate bias-aware formulations when applying DSM-based objectives to non-parameter targets in diffusion-model applications.
Abstract
Many recent works utilize denoising score matching to optimize the conditional input of diffusion models. In this workshop paper, we demonstrate that such optimization breaks the equivalence between denoising score matching and exact score matching. Furthermore, we show that this bias leads to higher score norm. Additionally, we observe a similar bias when optimizing the data distribution using a pre-trained diffusion model. Finally, we discuss the wide range of works across different domains that are affected by this bias, including MAR for auto-regressive generation, PerCo for image compression, and DreamFusion for text to 3D generation.
