Table of Contents
Fetching ...

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

Xingzhi Sun, Danqi Liao, Kincaid MacDonald, Yanlei Zhang, Chen Liu, Guillaume Huguet, Guy Wolf, Ian Adelstein, Tim G. J. Rudner, Smita Krishnaswamy

TL;DR

We address geometry-aware data generation, interpolation, and population transport on data manifolds by introducing the Geometry-Aware Generative Autoencoder (GAGA). GAGA learns a geometry-respecting latent embedding and derives a warped pullback metric that governs data generation, geodesic interpolation, and population transport across the manifold. The approach yields quantitative improvements in manifold-distance preservation, reduces sampling imbalance, and achieves competitive geodesic accuracy across synthetic manifolds and real single-cell datasets, including a notable ~30% improvement in trajectory inference. By integrating manifold learning with neural generative modeling, GAGA provides a principled way to align latent geometry with data-space structure for scalable, geometry-aware generation and transport.

Abstract

Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.

Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds

TL;DR

We address geometry-aware data generation, interpolation, and population transport on data manifolds by introducing the Geometry-Aware Generative Autoencoder (GAGA). GAGA learns a geometry-respecting latent embedding and derives a warped pullback metric that governs data generation, geodesic interpolation, and population transport across the manifold. The approach yields quantitative improvements in manifold-distance preservation, reduces sampling imbalance, and achieves competitive geodesic accuracy across synthetic manifolds and real single-cell datasets, including a notable ~30% improvement in trajectory inference. By integrating manifold learning with neural generative modeling, GAGA provides a principled way to align latent geometry with data-space structure for scalable, geometry-aware generation and transport.

Abstract

Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.

Paper Structure

This paper contains 59 sections, 8 theorems, 38 equations, 12 figures, 8 tables, 2 algorithms.

Key Result

Proposition 3.1

For Riemannian manifolds $(\mathcal{M},g_{\mathcal{M}}),(\mathcal{N},g_{\mathcal{N}})$ and diffeomorphism $f:\mathcal{M}\to\mathcal{N}$, if $f$ is a local isometry, i.e., there exists $\epsilon>0,$ such that for any $x_0,x_1\in\mathcal{M}, d_{\mathcal{M}}(x_0,x_1)<\epsilon\implies d_{\mathcal{M}}(x_

Figures (12)

  • Figure 1: The Geometry-Aware Generative Autoencoder (GAGA) framework. (A) Training the networks. (B) Obtaining the warped pullback metric. (C) Challenging applications enabled by GAGA.
  • Figure 2: Density-based vs geometry-based generation. Left: Data has sampling imbalance. Middle: Density-based methods, e.g. Diffusion Model and Flow Matching, maintain this bias. Right: Geometry-aware generation alleviates imbalance by generating points uniformly across the manifold.
  • Figure 3: Demonstration of uniform sampling on a spiral (a 1D manifold). Left: In the space parameterized by polar angle, data (blue points) are distributed with density proportional to the volume distribution function (green curve), and may appear non-uniform. Right: In fact, corresponding data on the manifold (blue points) are equally spaced w.r.t. geodesic distance, and are therefore "uniformly distributed".
  • Figure 4: Geometry-aware generation with GAGA on hemisphere and saddle. (A) Generated points remain on the manifold, and are more evenly distributed compared to raw data. (B) Kernel density estimation. (C) Ground truth volume elements computed analytically. (D) In raw data, density does not correlate to volume element, indicating data imbalance. GAGA generation corrects the imbalance indicated by higher correlation between volume element and density.
  • Figure 5: Geometry-aware generation with GAGA on Embryoid Body data. Left: The dataset includes measurements from five experiments. Middle: The data is sparse and imbalanced. Colors indicate density estimation. Right: GAGA reduces sampling imbalance.
  • ...and 7 more figures

Theorems & Definitions (19)

  • Proposition 3.1
  • Definition 3.1
  • Definition 3.2
  • Lemma 3.2
  • Definition 3.3
  • Proposition 3.3
  • Lemma 3.4
  • Proposition 3.5
  • Proposition 3.6
  • Lemma C.1
  • ...and 9 more