Score Change of Variables
Stephen Robbins
TL;DR
This work derives a general score change of variables formula that expresses the transformed score $\nabla_{\mathbf{y}} \log q(\mathbf{y})$ directly from the original score $\nabla_{\mathbf{x}} \log p(\mathbf{x})$ and the derivatives of a smooth invertible map $\phi$. It then harnesses this result to (i) establish a reverse-time Itô lemma for transformed diffusion processes, enabling sampling in a transformed space using scores learned in the original space, and (ii) introduce Generalized Sliced Score Matching (GSSM), which extends sliced score matching to arbitrary smooth transformations. The paper demonstrates two key applications: diffusion on the probability simplex via an additive logistic transform, and density estimation with Deep Kernel Exponential Families using GSSM, including a variance-reduced variant. Together these contributions provide a principled framework for transforming score-based methods across geometries, enabling training in one space while sampling in another, and broadening the flexibility and applicability of score-based density estimation and generative modeling.
Abstract
We derive a general change of variables formula for score functions, showing that for a smooth, invertible transformation $\mathbf{y} = φ(\mathbf{x})$, the transformed score function $\nabla_{\mathbf{y}} \log q(\mathbf{y})$ can be expressed directly in terms of $\nabla_{\mathbf{x}} \log p(\mathbf{x})$. Using this result, we develop two applications: First, we establish a reverse-time Itô lemma for score-based diffusion models, allowing the use of $\nabla_{\mathbf{x}} \log p_t(\mathbf{x})$ to reverse an SDE in the transformed space without directly learning $\nabla_{\mathbf{y}} \log q_t(\mathbf{y})$. This approach enables training diffusion models in one space but sampling in another, effectively decoupling the forward and reverse processes. Second, we introduce generalized sliced score matching, extending traditional sliced score matching from linear projections to arbitrary smooth transformations. This provides greater flexibility in high-dimensional density estimation. We demonstrate these theoretical advances through applications to diffusion on the probability simplex and empirically compare our generalized score matching approach against traditional sliced score matching methods.
