Table of Contents
Fetching ...

Hyperspherical Variational Auto-Encoders

Tim R. Davidson, Luca Falorsi, Nicola De Cao, Thomas Kipf, Jakub M. Tomczak

TL;DR

This work introduces a hyperspherical latent space for variational auto-encoders by replacing the Gaussian prior and posterior with a von Mises-Fisher distribution, enabling a true uniform prior on the hypersphere. It derives the KL divergence, develops a sampling procedure, and extends the reparameterization trick to rejection-based Sampling for vMF, addressing optimization challenges. Empirically, S-VAE better recovers hyperspherical latent structure in low dimensions, improves unsupervised and semi-supervised MNIST performance, and enhances link prediction in VGAE on several citation networks, with some dataset-dependent results. The study highlights both the benefits and limitations of hyperspherical modeling and points to future work in flexible posteriors, dynamic latent radii, and higher-dimensional scalability.

Abstract

The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types. Code at http://github.com/nicola-decao/s-vae-tf and https://github.com/nicola-decao/s-vae-pytorch

Hyperspherical Variational Auto-Encoders

TL;DR

This work introduces a hyperspherical latent space for variational auto-encoders by replacing the Gaussian prior and posterior with a von Mises-Fisher distribution, enabling a true uniform prior on the hypersphere. It derives the KL divergence, develops a sampling procedure, and extends the reparameterization trick to rejection-based Sampling for vMF, addressing optimization challenges. Empirically, S-VAE better recovers hyperspherical latent structure in low dimensions, improves unsupervised and semi-supervised MNIST performance, and enhances link prediction in VGAE on several citation networks, with some dataset-dependent results. The study highlights both the benefits and limitations of hyperspherical modeling and points to future work in flexible posteriors, dynamic latent radii, and higher-dimensional scalability.

Abstract

The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or -VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, -VAE, in low dimensions on other data types. Code at http://github.com/nicola-decao/s-vae-tf and https://github.com/nicola-decao/s-vae-pytorch

Paper Structure

This paper contains 46 sections, 3 theorems, 27 equations, 10 figures, 7 tables, 3 algorithms.

Key Result

Lemma 1

Let $f$ be any measurable function and $\varepsilon \sim \pi(\varepsilon| \theta) = s(\varepsilon)\dfrac{g(h(\varepsilon,\theta)| \theta)}{r(h(\varepsilon,\theta)| \theta)}$ the distribution of the accepted sample. Then:

Figures (10)

  • Figure 1: Plots of the original latent space (a) and learned latent space representations in different settings, where $\beta$ is a re-scaling factor for weighting the KL divergence. (Best viewed in color)
  • Figure 2: Latent space visualization of the 10 MNIST digits in 2 dimensions of both $\mathcal{N}$-VAE (left) and $\mathcal{S}$-VAE (right). (Best viewed in color)
  • Figure 3: Latent space of unsupervised $\mathcal{N}$-VGAE and $\mathcal{S}$-VGAE models trained on Cora citation network. Colors denote documents classes which are not provided during training. (Best viewed in color)
  • Figure 4: Overview of von Mises-Fisher sampling procedure. Note that as $\omega$ is a scalar, the procedure does not suffer from the curse of dimensionality.
  • Figure 5: Geometric representation of a single sample in $\mathcal{S}^2$, where $\omega \sim g(\omega|k)$ and $\mathbf{v} \sim U(\mathcal{S}^{1})$.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Lemma 1
  • Lemma 2
  • proof
  • Lemma 3: 2
  • proof