Table of Contents
Fetching ...

Manifold Aware Denoising Score Matching (MAD)

Alona Levy-Jurgenson, Alvaro Prat, James Cuin, Yee Whye Teh

TL;DR

This work proposes a simple modification to denoising score-matching in the ambient space to implicitly account for the manifold, thereby reducing the burden of learning the manifold while maintaining computational efficiency.

Abstract

A major focus in designing methods for learning distributions defined on manifolds is to alleviate the need to implicitly learn the manifold so that learning can concentrate on the data distribution within the manifold. However, accomplishing this often leads to compute-intensive solutions. In this work, we propose a simple modification to denoising score-matching in the ambient space to implicitly account for the manifold, thereby reducing the burden of learning the manifold while maintaining computational efficiency. Specifically, we propose a simple decomposition of the score function into a known component $s^{base}$ and a remainder component $s-s^{base}$ (the learning target), with the former implicitly including information on where the data manifold resides. We derive known components $s^{base}$ in analytical form for several important cases, including distributions over rotation matrices and discrete distributions, and use them to demonstrate the utility of this approach in those cases.

Manifold Aware Denoising Score Matching (MAD)

TL;DR

This work proposes a simple modification to denoising score-matching in the ambient space to implicitly account for the manifold, thereby reducing the burden of learning the manifold while maintaining computational efficiency.

Abstract

A major focus in designing methods for learning distributions defined on manifolds is to alleviate the need to implicitly learn the manifold so that learning can concentrate on the data distribution within the manifold. However, accomplishing this often leads to compute-intensive solutions. In this work, we propose a simple modification to denoising score-matching in the ambient space to implicitly account for the manifold, thereby reducing the burden of learning the manifold while maintaining computational efficiency. Specifically, we propose a simple decomposition of the score function into a known component and a remainder component (the learning target), with the former implicitly including information on where the data manifold resides. We derive known components in analytical form for several important cases, including distributions over rotation matrices and discrete distributions, and use them to demonstrate the utility of this approach in those cases.
Paper Structure (39 sections, 12 theorems, 62 equations, 9 figures, 4 tables)

This paper contains 39 sections, 12 theorems, 62 equations, 9 figures, 4 tables.

Key Result

Theorem 2.1

Let $N \in \mathbb{N}$, $N>1$ and $\mathcal{M}=\{u_i\}_{i=1}^{N}\subset\mathbb{R}^n$ for some $n \in\mathbb{N}$. Let $\mu$ be the uniform normalisedNote that in the caption of Figure fig:intro the unnormalised counting measure is used, but an equivalent, less intuitive, form may be obtained for its

Figures (9)

  • Figure 1: Introduction to MAD through a toy discrete distribution over $\mathcal{M}=\{(-1,0),(1,0)\}$ (red) with $p = \{0.1, 0.9\}$, respectively, and $p^{base} = \{0.5, 0.5\}$. Vector fields are shown for the actual score $s$, the base score $s^{base}$ and the difference $s-s^{base}$ for $\sigma_t = 0.8$. The magnitude of $s$ (DSM's learning target) and $s-s^{base}$ (MAD's learning target) as a function of $\sigma_t$ is shown for $x=(1,0)$.
  • Figure 2: Earth data (left panel) vs. generated samples from different methods.The colour map, defined by the polar axis, is left for visualisation purposes.
  • Figure 3: $\mathrm{SO}(3)$ results for $K=16$ in the form of a Mollweide projection for different methods and data. All figures are from seed $0$. Points are coloured according to angle, not component.
  • Figure 4: MMD vs cumulative training-step time. Shaded areas are ± std across $5$ runs. MMD is computed against the training set for $K=64$ using 1,000 samples for efficiency (the table uses 5,000). Note that for better clarity of comparison between all methods, the time shown is up to 80, but FFF ends after 400.
  • Figure 5: 2D image snapshots of a randomly rotated cone, cylinder, cube, icosahedron (a-d) and Mollewide projections of the corresponding pose distributions in $\mathrm{SO}(3)$. Black circles in (a-b) denote samples from MAD and are excluded from (c-d) for visual clarity.
  • ...and 4 more figures

Theorems & Definitions (17)

  • Theorem 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Corollary 2.4
  • Corollary 2.5
  • Corollary 1.1: Quaternion canonicalisation for $\mathrm{SO}(3)/G$
  • Theorem 2.1
  • proof
  • Theorem 2.2
  • proof
  • ...and 7 more