Implicit Manifold Gaussian Process Regression
Bernardo Fichera, Viacheslav Borovitskiy, Andreas Krause, Aude Billard
TL;DR
The paper tackles high‑dimensional Gaussian process regression by learning an implicit low‑dimensional manifold from labeled and unlabeled data. It introduces IMGP, a differentiable framework that builds a geometry‑aware prior via graph Matérn kernels on a learned manifold and blends it with a standard Euclidean GP through a distance‑dependent bump. The method leverages KNN graphs, Nyström extensions, Lanczos‑based eigenpairs, and efficient matrix‑vector operations to scale to large datasets, with theoretical grounding in convergence to manifold Matérn kernels. Empirically, IMGP improves predictive uncertainty and calibration on synthetic and real high‑dimensional tasks, particularly in semi‑supervised settings, while highlighting sensitivity to graph quality and approximation choices. This work advances probabilistic modeling in high dimensions by enabling geometry learning from data and integrating it into scalable, differentiable GP inference.
Abstract
Gaussian process regression is widely used because of its ability to provide well-calibrated uncertainty estimates and handle small or sparse datasets. However, it struggles with high-dimensional data. One possible way to scale this technique to higher dimensions is to leverage the implicit low-dimensional manifold upon which the data actually lies, as postulated by the manifold hypothesis. Prior work ordinarily requires the manifold structure to be explicitly provided though, i.e. given by a mesh or be known to be one of the well-known manifolds like the sphere. In contrast, in this paper we propose a Gaussian process regression technique capable of inferring implicit structure directly from data (labeled and unlabeled) in a fully differentiable way. For the resulting model, we discuss its convergence to the Matérn Gaussian process on the assumed manifold. Our technique scales up to hundreds of thousands of data points, and may improve the predictive performance and calibration of the standard Gaussian process regression in high-dimensional settings.
