Convergent autoencoder approximation of low bending and low distortion manifold embeddings

Juliane Braunsmann; Marko Rajković; Martin Rumpf; Benedikt Wirth

Convergent autoencoder approximation of low bending and low distortion manifold embeddings

Juliane Braunsmann, Marko Rajković, Martin Rumpf, Benedikt Wirth

TL;DR

A novel regularization is proposed for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own.

Abstract

Autoencoders, which consist of an encoder and a decoder, are widely used in machine learning for dimension reduction of high-dimensional data. The encoder embeds the input data manifold into a lower-dimensional latent space, while the decoder represents the inverse map, providing a parametrization of the data manifold by the manifold in latent space. A good regularity and structure of the embedded manifold may substantially simplify further data processing tasks such as cluster analysis or data interpolation. We propose and analyze a novel regularization for learning the encoder component of an autoencoder: a loss functional that prefers isometric, extrinsically flat embeddings and allows to train the encoder on its own. To perform the training it is assumed that for pairs of nearby points on the input manifold their local Riemannian distance and their local Riemannian average can be evaluated. The loss functional is computed via Monte Carlo integration with different sampling strategies for pairs of points on the input manifold. Our main theorem identifies a geometric loss functional of the embedding map as the $Γ$-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing.

Convergent autoencoder approximation of low bending and low distortion manifold embeddings

TL;DR

Abstract

-limit of the sampling-dependent loss functionals. Numerical tests, using image data that encodes different explicitly given data manifolds, show that smooth manifold embeddings into latent space are obtained. Due to the promotion of extrinsic flatness, these embeddings are regular enough such that interpolation between not too distant points on the manifold is well approximated by linear interpolation in latent space as one possible postprocessing.

Paper Structure (15 sections, 5 theorems, 64 equations, 10 figures, 1 table)

This paper contains 15 sections, 5 theorems, 64 equations, 10 figures, 1 table.

Introduction
A low bending and low distortion regularization for encoders
A low bending and low distortion loss functional
Monte Carlo limit for dense sampling
Analysis of the continuous sampling loss functional
Preliminaries on function spaces
Existence of minimizer to nonlocal energy
The limit for vanishing sampling radius
Numerical experiments
Different input manifolds
Autoencoder architecture and training procedure
Visualization of the embeddings
Linear interpolation in latent space
Noise in the embedding due to image quantization
Some open questions

Key Result

Lemma 1

Let $\phi \in C^{2, 1}(M, \mathbb{R}^l)$. Then, for all $x \in M, w\in U_x$ we have

Figures (10)

Figure 1: Results for dataset \ref{['dataset:s']} (cf.\ref{['subsec:setup']}) for $\lambda=0$ and $\epsilon=\pi/8$ (with maximal distance being $\pi$). From left to right: a sketch of the sundial configuration; training data (image pairs with their geodesic average and distance); the latent manifold $\phi(M)\subset \mathbb{R}^{16}$ projected into $\mathbb{R}^3$ via PCA; decoder outputs for the orange points in latent space. The encoder was trained separately from the decoder and the training was stopped after the value of the functional $E^{\mathcal{S}_{\epsilon}}[\phi]$ evaluated on a test set did not decrease for 20 epochs. Pairs with distance below $\frac{1}{100}$ of the maximal distance were rejected. The decoder was trained until an accuracy of $10^{-5}$ was reached on the same test set. The percentage of explained variance for the first three components lies at 97.8%, with a threshold of 99% reached at 6 components.
Figure 2: The parametrization $\Pi_x$ that maps the cone $\mathop{\mathrm{\mathcal{C}}}\nolimits_{r_0,\kappa}\subset U_x \subset \mathbb{R}^m$ into $M$ and the neighborhoods $U_x^\epsilon\subset U_x$ onto $D_\epsilon^M(x)$.
Figure 3: Comparison of temporal evolution of the three loss components for dataset \ref{['dataset:r']} for $\lambda=10$ and $\lambda=0$ (logarithmic $y$-axis). Per optimization step, 10000 images are processed in batches of 128.
Figure 4: Comparison of the latent manifold $\phi(M) \subset \mathbb{R}^{16}$ for sundial dataset \ref{['dataset:s']} for $\lambda=0$ and $\lambda=10$ and $\epsilon=\pi/8$ (colored points as in \ref{['fig:masterpiece']}). The same training procedure as in \ref{['fig:masterpiece']} was used. The three visualized components explain $97.8\%$$(\lambda = 0)$ and $99.95\%$$(\lambda = 10)$ of the variance. A threshold of $99\%$ variance is reached for $6$$(\lambda=0$) and $2$$(\lambda=10)$ components.
Figure 5: Visualization of the results for dataset \ref{['dataset:r']} for $\lambda=10$ and $\epsilon=\frac{\pi}{4}$ (with maximal distance being $\pi$). Encoder and decoder were trained separately. Training of the encoder was stopped after the value of the training loss did not decrease for 20 epochs. The decoder was trained until the reconstruction error $R$ evaluated on a test set reached a threshold of $2\cdot 10^{-3}$. Pairs with distance below $\frac{1}{20}$ of the maximal distance were rejected. The embedding is smooth, revealing the topology and geometry of the manifold $SO(3)$, and reasonable interpolations are obtained for $\lambda=10$. On the left, three input triples and their distances are shown. To visualize the latent manifold, a PCA was performed on the point cloud in 16-dimensional latent space obtained by applying the embedding map $\phi$ to a large number of images from $M\cong SO(3)$. For the bottom right image we fixed a rotation angle and randomly sampled the rotation axis from $S^2$. For both other visualizations we regularly sample rotation axes from different latitudes of $S^2$ and then show points corresponding to rotations around each axis in the same color. The graph coordinates correspond to the principal components indicated by their labels. In components 2, 3, 4, a sphere-like structure can be observed, suggesting that these three components encode the rotation axis. The graph on the bottom right shows the the total amount of explained variance as a function of increasing subspace dimension, with a threshold of 99% reached for 6 dimensions for $\lambda=10$. Below the dashed line we show examples of interpolations generated by interpolating linearly in latent space and subsequent decoding. For vanishing bending regularization $\lambda=0$ those interpolations are unreliable, while they look very reasonable for $\lambda=10$, even though the decoder was neither trained on linear interpolations in latent space nor was it regularized.
...and 5 more figures

Theorems & Definitions (11)

Remark 1
Lemma 1
proof
Proposition 1
proof
Theorem 1
proof
Theorem 2: Mosco-convergence
Remark 2
proof : Proof of \ref{['thm:Mosco_smooth']}
...and 1 more

Convergent autoencoder approximation of low bending and low distortion manifold embeddings

TL;DR

Abstract

Convergent autoencoder approximation of low bending and low distortion manifold embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (11)