Table of Contents
Fetching ...

Atlas Gaussian processes on restricted domains and point clouds

Mu Niu, Yue Zhang, Ke Ye, Pokman Cheung, Yizhu Wang, Xiaochen Yang

TL;DR

This work advances Gaussian process regression on restricted domains and unknown manifolds by introducing an Atlas Brownian Motion framework to estimate heat kernels on point clouds and by constructing Riemannian-corrected Atlas GPs (RC-AGPs) that fuse global diffusion information with local Euclidean smoothness. By partitioning data with Mapper, learning local charts via GPLVM or autoencoders, and simulating BM paths across overlapping charts, the authors obtain geometry-aware kernels with theoretical guarantees of positive semidefiniteness and asymptotic unbiasedness of the heat-kernel estimator. RC-AGPs demonstrate superior regression performance on torus, U-shape, high-dimensional shark image clouds, and Aral Sea data, outperforming Euclidean GPs and Graph-Laplacian GPs while requiring fewer samples. The approach offers scalable, topology-respecting inference for complex manifolds and opens avenues for dynamic manifolds and adaptive atlas construction. The combination of probabilistic atlases, BM-based heat kernels, and RC-kernels provides a principled, efficient framework for manifold-based GP modeling with strong practical impact in geospatial and high-dimensional data analysis.

Abstract

In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. Traditional Gaussian Processes (GPs) struggle to capture the underlying geometry in such settings. Some existing methods assume a flat space embedded in a point cloud, which can be represented by a single latent chart (latent space), while others exhibit weak performance when the point cloud is sparse or irregularly sampled. The goal of this work is to address these challenges. The main contributions are twofold: (1) We establish the Atlas Brownian Motion (BM) framework for estimating the heat kernel on point clouds with unknown geometries and nontrivial topological structures; (2) Instead of directly using the heat kernel estimates, we construct a Riemannian corrected kernel by combining the global heat kernel with local RBF kernel and leading to the formulation of Riemannian-corrected Atlas Gaussian Processes (RC-AGPs). The resulting RC-AGPs are applied to regression tasks across synthetic and real-world datasets. These examples demonstrate that our method outperforms existing approaches in both heat kernel estimation and regression accuracy. It improves statistical inference by effectively bridging the gap between complex, high-dimensional observations and manifold-based inferences.

Atlas Gaussian processes on restricted domains and point clouds

TL;DR

This work advances Gaussian process regression on restricted domains and unknown manifolds by introducing an Atlas Brownian Motion framework to estimate heat kernels on point clouds and by constructing Riemannian-corrected Atlas GPs (RC-AGPs) that fuse global diffusion information with local Euclidean smoothness. By partitioning data with Mapper, learning local charts via GPLVM or autoencoders, and simulating BM paths across overlapping charts, the authors obtain geometry-aware kernels with theoretical guarantees of positive semidefiniteness and asymptotic unbiasedness of the heat-kernel estimator. RC-AGPs demonstrate superior regression performance on torus, U-shape, high-dimensional shark image clouds, and Aral Sea data, outperforming Euclidean GPs and Graph-Laplacian GPs while requiring fewer samples. The approach offers scalable, topology-respecting inference for complex manifolds and opens avenues for dynamic manifolds and adaptive atlas construction. The combination of probabilistic atlases, BM-based heat kernels, and RC-kernels provides a principled, efficient framework for manifold-based GP modeling with strong practical impact in geospatial and high-dimensional data analysis.

Abstract

In real-world applications, data often reside in restricted domains with unknown boundaries, or as high-dimensional point clouds lying on a lower-dimensional, nontrivial, unknown manifold. Traditional Gaussian Processes (GPs) struggle to capture the underlying geometry in such settings. Some existing methods assume a flat space embedded in a point cloud, which can be represented by a single latent chart (latent space), while others exhibit weak performance when the point cloud is sparse or irregularly sampled. The goal of this work is to address these challenges. The main contributions are twofold: (1) We establish the Atlas Brownian Motion (BM) framework for estimating the heat kernel on point clouds with unknown geometries and nontrivial topological structures; (2) Instead of directly using the heat kernel estimates, we construct a Riemannian corrected kernel by combining the global heat kernel with local RBF kernel and leading to the formulation of Riemannian-corrected Atlas Gaussian Processes (RC-AGPs). The resulting RC-AGPs are applied to regression tasks across synthetic and real-world datasets. These examples demonstrate that our method outperforms existing approaches in both heat kernel estimation and regression accuracy. It improves statistical inference by effectively bridging the gap between complex, high-dimensional observations and manifold-based inferences.

Paper Structure

This paper contains 31 sections, 6 theorems, 60 equations, 25 figures, 5 tables, 1 algorithm.

Key Result

Theorem 1

If the region on $\mathbb{M}$ can be parameterised by multiple charts, the stochastic process defined in eqn:swBM is chart independent. With a given diffusion time, simulations in any choice of charts are equivalent to the same step in $\mathbb{M}$.

Figures (25)

  • Figure 1: Atlas with two charts on $\mathbb{M}$ and change of coordinates operations.
  • Figure 2: The torus atlas and point cloud.
  • Figure 3: A demonstration of simulating a BM path by switching the stochastic process in different charts (or latent spaces). The collection of charts forms the atlas of the torus manifold. The black trajectory on the torus (middle panel) represents a BM path with a red star indicating the starting point. The green region of the torus is described by Chart 2 on the right. The brown region of the torus is describe by Chart 1 on the left. The purple trajectory in Chart 1 represents a stochastic process simulated by \ref{['eqn:swBM']} with $i=1$. Once the path reaches the overlapping region, indicated by the blue star $x_{1j}$ on the left panel, the blue star are transferred to Chart 2 using the change of coordinates operation: $x_{2j} = \varphi_2^{-1} \circ \varphi_1 ( x_{1j} )$. The stochastic process continues in Chart 2, as shown by the purple trajectory in the right panel (simulated by \ref{['eqn:swBM']} with $i=2$). The projection of both segments of purple lines from the charts onto the torus generates the BM path shown in the middle panel.
  • Figure 4: BM paths on $\mathbb{M}$: (a) $s_0$ (red triangle) is the starting point of three BM paths (black, green and blue solid lines). Only the blue path reaches the neighbourhood $\mathbb{A}$ (blue circle) of $s$ (black cross) at time $t$. Therefore the estimate of the transition probability $p( {S}(t) \in \mathbb{A} \ | \ {S}(0)=s_0 ) = \frac{1}{3}$. (b) The blue path, generated using the SDE on a single chart niu2023, starts at the red triangle and traverses through the central void of the torus, demonstrating that it does not correspond to a proper BM on the torus.
  • Figure 5: Kernel estimates comparison. The analytical kernel density is plotted as a solid red line. The kernel estimates using BM transition density with the GPLVM atlas are represented by a dashed blue line. The kernel estimates using BM transition density with the AE atlas are shown as a dashed purple line. Both atlases are constructed using 625 points on the Torus. The GL kernel estimates are plotted as a green dash-dotted line when the number of points is 625. We also plot the GL estimates as a black dotted line and a brown dashed line when the number of points increases to 2,500 and 6,400, respectively. The kernel estimates using single chart BM are shown in yellow dotted line.
  • ...and 20 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem
  • Theorem
  • Theorem