Table of Contents
Fetching ...

Riemannian generative decoder

Andreas Bjerregaard, Søren Hauberg, Anders Krogh

TL;DR

The paper addresses learning latent representations on general Riemannian manifolds without encoder-based density models. It introduces the Riemannian generative decoder (RGD), which learns manifold-valued latents via a Riemannian optimizer while training a decoder, bypassing density estimation and amortized inference. A geometry-aware regularization based on curvature-aligned noise guides local decoder smoothness to reflect the manifold metric, improving alignment between latent distances and data geometry. Across three diverse datasets—the synthetic branching diffusion, hmtDNA haplogroups, and scRNA-seq cell cycles—RGD yields geometry-consistent latent spaces, demonstrates competitive generative fidelity, and scales favorably to higher latent dimensionality, highlighting potential for broad non-Euclidean representation learning.

Abstract

Riemannian representation learning typically relies on an encoder to estimate densities on chosen manifolds. This involves optimizing numerically brittle objectives, potentially harming model training and quality. To completely circumvent this issue, we introduce the Riemannian generative decoder, a unifying approach for finding manifold-valued latents on any Riemannian manifold. Latents are learned with a Riemannian optimizer while jointly training a decoder network. By discarding the encoder, we vastly simplify the manifold constraint compared to current approaches which often only handle few specific manifolds. We validate our approach on three case studies -- a synthetic branching diffusion process, human migrations inferred from mitochondrial DNA, and cells undergoing a cell division cycle -- each showing that learned representations respect the prescribed geometry and capture intrinsic non-Euclidean structure. Our method requires only a decoder, is compatible with existing architectures, and yields interpretable latent spaces aligned with data geometry. Code available on https://github.com/yhsure/riemannian-generative-decoder.

Riemannian generative decoder

TL;DR

The paper addresses learning latent representations on general Riemannian manifolds without encoder-based density models. It introduces the Riemannian generative decoder (RGD), which learns manifold-valued latents via a Riemannian optimizer while training a decoder, bypassing density estimation and amortized inference. A geometry-aware regularization based on curvature-aligned noise guides local decoder smoothness to reflect the manifold metric, improving alignment between latent distances and data geometry. Across three diverse datasets—the synthetic branching diffusion, hmtDNA haplogroups, and scRNA-seq cell cycles—RGD yields geometry-consistent latent spaces, demonstrates competitive generative fidelity, and scales favorably to higher latent dimensionality, highlighting potential for broad non-Euclidean representation learning.

Abstract

Riemannian representation learning typically relies on an encoder to estimate densities on chosen manifolds. This involves optimizing numerically brittle objectives, potentially harming model training and quality. To completely circumvent this issue, we introduce the Riemannian generative decoder, a unifying approach for finding manifold-valued latents on any Riemannian manifold. Latents are learned with a Riemannian optimizer while jointly training a decoder network. By discarding the encoder, we vastly simplify the manifold constraint compared to current approaches which often only handle few specific manifolds. We validate our approach on three case studies -- a synthetic branching diffusion process, human migrations inferred from mitochondrial DNA, and cells undergoing a cell division cycle -- each showing that learned representations respect the prescribed geometry and capture intrinsic non-Euclidean structure. Our method requires only a decoder, is compatible with existing architectures, and yields interpretable latent spaces aligned with data geometry. Code available on https://github.com/yhsure/riemannian-generative-decoder.

Paper Structure

This paper contains 37 sections, 20 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Our decoder reconstructs data from Riemannian manifolds where representations are learned as model parameters via maximum a posteriori.
  • Figure 2: Cell cycle phases using either (a) UMAP or (b--d) different Riemannian manifolds. Samples are concatenated across train/validation/test sets. The phase is inferred by DeepCycle as a continuous variable $\phi\in[0,1)$ which wraps around such that $\phi=0$ and $\lim_{\phi\to1}\phi$ denote the same point in the cycle. Best viewed zoomed in.
  • Figure 3: Visualizations of the branching diffusion process. Trees consist of 7 levels with color lightness denoting depth. (a) UMAP projection; (b) Poincaré disk projection of Lorentz latents using geometric regularization ($c=5.0, \sigma=0.5$); (c) Ablation study showing the influence of the noise scale $\sigma$, listing Pearson correlation $\rho$ and mean squared error on the training set.
  • Figure 4: Visualizations of hmtDNA haplogroups using either (a) UMAP, (b) Euclidean latent space, or (c--d) Poincaré projection of Lorentz latents $(c=5.0, \sigma=0.5)$. Edges represent simplified lineage lott2013mtdna, nodes indicate median haplogroup positions. Best viewed zoomed in.
  • Figure S2: Effects of manifold curvature and noise level for hyperbolic models on the synthetic branching diffusion dataset. The visualization is similar to \ref{['fig:synth_ablation']} but contains a selection of curvatures rather than $c=5.0$. Trees consist of 7 levels; color lightness denotes depth.
  • ...and 4 more figures