Hilbert space methods for approximating multi-output latent variable Gaussian processes
Soham Mukherjee, Manfred Claassen, Paul-Christian Bürkner
TL;DR
This work tackles GP scalability for high-dimensional, multi-output, and latent-input settings by extending Hilbert space Gaussian processes (HSGPs) to model cross-output correlations and latent inputs through spectral-density basis expansions. The approach yields linear scalability in the number of data points and basis functions, enabling Bayesian inference via MCMC or variational methods while maintaining strong uncertainty calibration. Through extensive simulations and a real scRNA pseudotime case study, the authors show that HSGPs offer superior latent-variable estimation accuracy and calibration compared to exact GPs and inducing-point VI methods, with practical speed benefits over exact inference. The method provides a scalable, calibration-friendly alternative for complex GP scenarios in genomics and related domains, though it includes considerations about priors, output-correlation modeling, and potential extensions to derivative information.
Abstract
Gaussian processes are a powerful class of non-linear models, but have limited applicability for larger datasets due to their high computational complexity. In such cases, approximate methods are required, for example, the recently developed class of Hilbert space Gaussian processes. They have been shown to significantly reduce computation time while retaining most of the favorable properties of exact Gaussian processes. However, Hilbert space approximations have so far only been developed for uni-dimensional outputs and manifest (known) inputs. Thus, we generalize Hilbert space methods to multi-output and latent input settings. Through extensive simulations, we show that the developed approximate Gaussian processes are indeed not only faster, but also provide similar or even better uncertainty calibration and accuracy of latent variable estimates compared to exact Gaussian processes. While not necessarily faster than alternative Gaussian process approximations, our new models provide better calibration and estimation accuracy, thus striking an excellent balance between trustworthiness and speed. We additionally illustrate our methods on a real-world case study from single cell biology.
