Analytical Discovery of Manifold with Machine Learning
Yafei Shen, Huan-Fei Ma, Ling Yang
TL;DR
This work introduces GAMLA, a Global Analytical Manifold Learning framework that combines a two-round autoencoder training to yield a character map $G(\\bm{x})$ to an $m$-dimensional latent space and a complementary map $R(\\bm{x})$ to an $(n-m)$-dimensional space, producing an explicit analytic manifold equation $R(\\bm{x})=\\bm{0}$ alongside a global coordinate chart. The first round learns the manifold structure via a bottleneck of size $m$, while the second round augments the bottleneck to $n$ and trains only new weights to obtain $ ilde{\\bm{z}}=R(\\bm{x})$, enabling direct computation of differential-geometric quantities such as normals and curvature from analytic forms. The method delivers two coherent representations that decompose the latent space and capture local structure around the manifold, facilitating anomaly detection and explanation through the sign and magnitude of $\\tilde{\\bm{z}}$. Extensive experiments on synthetic manifolds, Swiss Roll, Stanford Bunny data, MALGO mouse phenotypes, and image datasets demonstrate accurate global unfolding, analytic manifold description, and interpretable anomaly categorization, with successful image interpolation in the learned character space. GAMLA thus bridges data-driven manifold learning and analytical geometry, offering a scalable, interpretable tool for exploring intrinsic data geometry and surrounding structures.
Abstract
Understanding low-dimensional structures within high-dimensional data is crucial for visualization, interpretation, and denoising in complex datasets. Despite the advancements in manifold learning techniques, key challenges-such as limited global insight and the lack of interpretable analytical descriptions-remain unresolved. In this work, we introduce a novel framework, GAMLA (Global Analytical Manifold Learning using Auto-encoding). GAMLA employs a two-round training process within an auto-encoding framework to derive both character and complementary representations for the underlying manifold. With the character representation, the manifold is represented by a parametric function which unfold the manifold to provide a global coordinate. While with the complementary representation, an approximate explicit manifold description is developed, offering a global and analytical representation of smooth manifolds underlying high-dimensional datasets. This enables the analytical derivation of geometric properties such as curvature and normal vectors. Moreover, we find the two representations together decompose the whole latent space and can thus characterize the local spatial structure surrounding the manifold, proving particularly effective in anomaly detection and categorization. Through extensive experiments on benchmark datasets and real-world applications, GAMLA demonstrates its ability to achieve computational efficiency and interpretability while providing precise geometric and structural insights. This framework bridges the gap between data-driven manifold learning and analytical geometry, presenting a versatile tool for exploring the intrinsic properties of complex data sets.
