Table of Contents
Fetching ...

Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

Koshi Watanabe, Keisuke Maeda, Takahiro Ogawa, Miki Haseyama

TL;DR

This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation and adopts generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning.

Abstract

Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. However, existing methods are based on neighbor embedding, frequently ruining the continual relation of the hierarchies. This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation. We adopt generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning. This paper presents three variants that employ original point, sparse point, and Bayesian estimations. We establish their learning algorithms by incorporating the Riemannian optimization and active approximation scheme of GP-LVM. For Bayesian inference, we further introduce the reparameterization trick to realize Bayesian latent variable learning. In the last part of this paper, we apply hGP-LVMs to several datasets and show their ability to represent high-dimensional hierarchies in low-dimensional spaces.

Hyperboloid GPLVM for Discovering Continuous Hierarchies via Nonparametric Estimation

TL;DR

This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation and adopts generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning.

Abstract

Dimensionality reduction (DR) offers a useful representation of complex high-dimensional data. Recent DR methods focus on hyperbolic geometry to derive a faithful low-dimensional representation of hierarchical data. However, existing methods are based on neighbor embedding, frequently ruining the continual relation of the hierarchies. This paper presents hyperboloid Gaussian process (GP) latent variable models (hGP-LVMs) to embed high-dimensional hierarchical data with implicit continuity via nonparametric estimation. We adopt generative modeling using the GP, which brings effective hierarchical embedding and executes ill-posed hyperparameter tuning. This paper presents three variants that employ original point, sparse point, and Bayesian estimations. We establish their learning algorithms by incorporating the Riemannian optimization and active approximation scheme of GP-LVM. For Bayesian inference, we further introduce the reparameterization trick to realize Bayesian latent variable learning. In the last part of this paper, we apply hGP-LVMs to several datasets and show their ability to represent high-dimensional hierarchies in low-dimensional spaces.

Paper Structure

This paper contains 19 sections, 1 theorem, 14 equations, 5 figures, 2 tables, 1 algorithm.

Key Result

Lemma 1

Let $\bm{\mathrm{x}}_{ 1}\in\mathbb{R}^{Q}$ and $\bm{\mathrm{x}}_{2}\in\mathbb{R}^{Q}$ be vectors with small scales, i.e., $||\bm{\mathrm{x}}_{1}||_{2}\approx0,\,||\bm{\mathrm{x}}_{2}||_{2}\approx0$. Set $\bm{\mathrm{x}}_{1}'=\left[\sqrt{1+||\bm{\mathrm{x}}_{1}||_{2}^{2}},\, \bm{\mathrm{x}}_{1}^{\to where $d_{E}(\bm{\mathrm{x}}_{1},\,\bm{\mathrm{x}}_{2})=||\bm{\mathrm{x}}_{1}-\bm{\mathrm{x}}_{2}||

Figures (5)

  • Figure 1: An illustration of hyperboloid Gaussian process latent variable models (hGP-LVMs). We learn the latent variables on the Lorentz model and visualize them on the Poincaré ball.
  • Figure 2: GP prior comparison between the hyperboloid exponential kernel (upper, $\kappa=5$) and Euclidean exponential kernel (bottom). The color gives the value of the sampled GP. We input the latent variables on the Poincaré ball model when $\mathcal{M}=\mathcal{L}^{Q}$ (left) and those on the unit circle when $\mathcal{M}=E$ (right).
  • Figure 3: An illustration of the synthetic binary tree (SBT) with $d=3$ and sampling procedure.
  • Figure 4: Qualitative results on the SBT dataset ($d=6$). (a) embedding comparison between generative models, (b) embedding of hGP-LVM with different length scales, and (c) color code of embedding.
  • Figure 5: Experimental results on the scRNA-seq dataset. (a) The canonical hematopoietic cell lineage tree. (b) Two-dimensional embedding of UMAP, PoincaréMap, Sparse hGP-LVM, and Bayesian hGP-LVM. The colors correspond to those of the lineage tree. (c) The error bar plot of comparative methods. We ran the same experiment 30 times and computed the mean error with standard deviation.

Theorems & Definitions (2)

  • Lemma 1
  • proof