Pulling back symmetric Riemannian geometry for data analysis

Willem Diepeveen

Pulling back symmetric Riemannian geometry for data analysis

Willem Diepeveen

TL;DR

This work characterises diffeomorphisms that result in proper, stable and efficient data analysis and uses best practices to guide construction of such diffeomorphisms through deep learning.

Abstract

Data sets tend to live in low-dimensional non-linear subspaces. Ideal data analysis tools for such data sets should therefore account for such non-linear geometry. The symmetric Riemannian geometry setting can be suitable for a variety of reasons. First, it comes with a rich mathematical structure to account for a wide range of non-linear geometries that has been shown to be able to capture the data geometry through empirical evidence from classical non-linear embedding. Second, many standard data analysis tools initially developed for data in Euclidean space can also be generalised efficiently to data on a symmetric Riemannian manifold. A conceptual challenge comes from the lack of guidelines for constructing a symmetric Riemannian structure on the data space itself and the lack of guidelines for modifying successful algorithms on symmetric Riemannian manifolds for data analysis to this setting. This work considers these challenges in the setting of pullback Riemannian geometry through a diffeomorphism. The first part of the paper characterises diffeomorphisms that result in proper, stable and efficient data analysis. The second part then uses these best practices to guide construction of such diffeomorphisms through deep learning. As a proof of concept, different types of pullback geometries -- among which the proposed construction -- are tested on several data analysis tasks and on several toy data sets. The numerical experiments confirm the predictions from theory, i.e., that the diffeomorphisms generating the pullback geometry need to map the data manifold into a geodesic subspace of the pulled back Riemannian manifold while preserving local isometry around the data manifold for proper, stable and efficient data analysis, and that pulling back positive curvature can be problematic in terms of stability.

Pulling back symmetric Riemannian geometry for data analysis

TL;DR

This work characterises diffeomorphisms that result in proper, stable and efficient data analysis and uses best practices to guide construction of such diffeomorphisms through deep learning.

Abstract

Paper Structure (63 sections, 19 theorems, 97 equations, 12 figures, 1 table)

This paper contains 63 sections, 19 theorems, 97 equations, 12 figures, 1 table.

Introduction
Related work
Fitting a submanifold
Constructing a chart
Remetrizing the ambient space
Contributions
Characterisation of diffeomorphisms for proper and stable data analysis.
Characterisation of diffeomorphisms for efficient data analysis.
Construction of diffeomorphisms for proper, stable and efficient data analysis.
Outline
Preliminaries
Notation
Riemannian geometry on $(\mathbb{R}^d, (\cdot,\cdot)^\varphi)$
Basic data processing on $(\mathbb{R}^d, (\cdot,\cdot)^\varphi)$
Diffeomorphisms for proper and stable interpolation
...and 48 more sections

Key Result

Proposition 2.1

Let $(\mathcal{M}, (\cdot,\cdot))$ be a $d$-dimensional Riemannian manifold and let $\varphi:\mathbb{R}^d \to \mathcal{M}$ be a smooth diffeomorphism such that $\varphi(\mathbb{R}^d) \subset \mathcal{M}$ is a geodesically convex set. Then,

Figures (12)

Figure 1: The non-linear $\mathbb{R}^2$-valued data set in (a) looks linear and 1-dimensional from $\mathbf{x}:=(0,0)$ under a learned non-standard Riemannian structure on $\mathbb{R}^2$ (b).
Figure 2: Three toy data sets.
Figure 3: Geodesic interpolation and perturbed geodesic interpolation of the end points of data set (a) on $(\mathbb{R}^2, (\cdot,\cdot)^{\varphi^{\mathrm{a}}})$ indicates that the chosen pullback geometry is suitable for data analysis on this data set and is stable with respect to small perturbations.
Figure 4: The data barycentre and perturbed data barycentre of data set (a) on $(\mathbb{R}^2, (\cdot,\cdot)^{\varphi^{\mathrm{a}}})$ indicates that the chosen pullback geometry is suitable for data analysis on this data set and is stable with respect to small perturbations.
Figure 5: Low rank approximation of data set (a) on $(\mathbb{R}^2, (\cdot,\cdot)^{\varphi^{\mathrm{a}}})$ and the Riemannian autoencoder constructed from it indicate that the chosen pullback geometry is suitable for data analysis on this data set.
...and 7 more figures

Theorems & Definitions (36)

Proposition 2.1
Proposition 2.2
Proposition 2.3
Corollary 2.3.1
Proposition 2.4
Lemma 3.1
proof
Lemma 3.2
proof
Lemma 3.3
...and 26 more

Pulling back symmetric Riemannian geometry for data analysis

TL;DR

Abstract

Pulling back symmetric Riemannian geometry for data analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (36)