Geometry-Aware Generative Autoencoders for Warped Riemannian Metric Learning and Generative Modeling on Data Manifolds
Xingzhi Sun, Danqi Liao, Kincaid MacDonald, Yanlei Zhang, Chen Liu, Guillaume Huguet, Guy Wolf, Ian Adelstein, Tim G. J. Rudner, Smita Krishnaswamy
TL;DR
We address geometry-aware data generation, interpolation, and population transport on data manifolds by introducing the Geometry-Aware Generative Autoencoder (GAGA). GAGA learns a geometry-respecting latent embedding and derives a warped pullback metric that governs data generation, geodesic interpolation, and population transport across the manifold. The approach yields quantitative improvements in manifold-distance preservation, reduces sampling imbalance, and achieves competitive geodesic accuracy across synthetic manifolds and real single-cell datasets, including a notable ~30% improvement in trajectory inference. By integrating manifold learning with neural generative modeling, GAGA provides a principled way to align latent geometry with data-space structure for scalable, geometry-aware generation and transport.
Abstract
Rapid growth of high-dimensional datasets in fields such as single-cell RNA sequencing and spatial genomics has led to unprecedented opportunities for scientific discovery, but it also presents unique computational and statistical challenges. Traditional methods struggle with geometry-aware data generation, interpolation along meaningful trajectories, and transporting populations via feasible paths. To address these issues, we introduce Geometry-Aware Generative Autoencoder (GAGA), a novel framework that combines extensible manifold learning with generative modeling. GAGA constructs a neural network embedding space that respects the intrinsic geometries discovered by manifold learning and learns a novel warped Riemannian metric on the data space. This warped metric is derived from both the points on the data manifold and negative samples off the manifold, allowing it to characterize a meaningful geometry across the entire latent space. Using this metric, GAGA can uniformly sample points on the manifold, generate points along geodesics, and interpolate between populations across the learned manifold using geodesic-guided flows. GAGA shows competitive performance in simulated and real-world datasets, including a 30% improvement over the state-of-the-art methods in single-cell population-level trajectory inference.
