Table of Contents
Fetching ...

Sliced Score Matching: A Scalable Approach to Density and Score Estimation

Yang Song, Sahaj Garg, Jiaxin Shi, Stefano Ermon

TL;DR

This work introduces sliced score matching (SSM), a scalable variant of score matching that projects high-dimensional scores onto random directions to avoid computing Hessian traces. By using Hessian–vector products and Hutchinson-style trace estimation, SSM enables training deep unnormalized models and estimating scores for implicit distributions with theoretical guarantees of consistency and asymptotic normality. The framework yields accurate score estimators and improves density estimation efficiency, with practical success on deep kernel exponential families and NICE flows, and enhances score-based learning for VAEs and WAEs. Empirically, SSM and its variance-reduced variant outperform existing scalable alternatives across density estimation and score estimation tasks, demonstrating strong performance and scalability in high-dimensional settings.

Abstract

Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.

Sliced Score Matching: A Scalable Approach to Density and Score Estimation

TL;DR

This work introduces sliced score matching (SSM), a scalable variant of score matching that projects high-dimensional scores onto random directions to avoid computing Hessian traces. By using Hessian–vector products and Hutchinson-style trace estimation, SSM enables training deep unnormalized models and estimating scores for implicit distributions with theoretical guarantees of consistency and asymptotic normality. The framework yields accurate score estimators and improves density estimation efficiency, with practical success on deep kernel exponential families and NICE flows, and enhances score-based learning for VAEs and WAEs. Empirically, SSM and its variance-reduced variant outperform existing scalable alternatives across density estimation and score estimation tasks, demonstrating strong performance and scalability in high-dimensional settings.

Abstract

Score matching is a popular method for estimating unnormalized statistical models. However, it has been so far limited to simple, shallow models or low-dimensional data, due to the difficulty of computing the Hessian of log-density functions. We show this difficulty can be mitigated by projecting the scores onto random vectors before comparing them. This objective, called sliced score matching, only involves Hessian-vector products, which can be easily implemented using reverse-mode automatic differentiation. Therefore, sliced score matching is amenable to more complex models and higher dimensional data compared to score matching. Theoretically, we prove the consistency and asymptotic normality of sliced score matching estimators. Moreover, we demonstrate that sliced score matching can be used to learn deep score estimators for implicit distributions. In our experiments, we show sliced score matching can learn deep energy-based models effectively, and can produce accurate score estimates for applications such as variational inference with implicit distributions and training Wasserstein Auto-Encoders.

Paper Structure

This paper contains 67 sections, 11 theorems, 94 equations, 3 figures, 9 tables, 2 algorithms.

Key Result

Theorem 1

Under some regularity conditions (Assumption ass:score-ass:boundary in Appendix app:basic), we have where $\mathrm{C}$ is a constant w.r.t.${\boldsymbol{\theta}}$.

Figures (3)

  • Figure 1: SM loss after training DKEF models on UCI datasets with different loss functions; lower is better. Results for approximate backprapogation are not shown because losses were larger than $10^{9}$.
  • Figure 2: SM performance degrades linearly with the data dimension, while efficient approaches have relatively similar performance.
  • Figure 3: Log-likelihoods after training DKEF models on UCI datasets with different loss functions; higher is better. Results for approximate backpropagation are not shown because log-likelihoods were smaller than $-10^{6}$.

Theorems & Definitions (27)

  • Theorem 1
  • Theorem 2: Consistency
  • proof : Sketch of proof
  • Remark 1
  • Theorem 3: Asymptotic normality, special case
  • proof : Sketch of proof
  • Remark 2
  • Remark 3
  • proof
  • Lemma 1
  • ...and 17 more